Fundamental Approaches to Software Engineering: 10th International Conference, FASE 2007 Held as Part of the Joint European Conference on Theory and ... (Lecture Notes in Computer Science, 4422) 3540712887, 9783540712886

This book constitutes the refereed proceedings of the 10th International Conference on Fundamental Approaches to Softwar

130 83 13MB

English Pages 458 [452] Year 2007

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Title pages
Foreword
Preface
Organization
Table of Contents
Software Product Families: Towards Compositionality
Introduction
Problem Statement
Towards Compositionality
Component Model for Compositional Platforms
Conclusions
References
EQ-Mine: Predicting Short-Term Defects forSoftware Evolution
Introduction
Hypotheses
Related Work
DataMeasures
Features
Data Mining
Case Study
Experimental Setup
Results
Limitations
Conclusions and Future Work
References
An Approach to Software EvolutionBased on Semantic Change
Introduction
Change-Based Object-Oriented Software Evolution
Case Studies
Detailing the Evolution of a Student Project
Discussion
Related Work
Tool Implementation
Conclusion and Future Work
A Simulation-Oriented Formalization for aPsychological Theory
Introduction
Formalization Process
A Brief Introduction to Behavior Analysis
Results
Specification Overview
Specification: Main Elements
Discussion
Integrating Performance and Reliability Analysisin a Non-Functional MDA Framework
Introduction
Non-Functional MDA Framework
Tool Support for the NFMDA Framework
Two NFMDA Framework Instances
Performance Analysis in MDA
Reliability Analysis in MDA
Conclusions
Information Preserving BidirectionalModel Transformations
Introduction
Review of Triple Rules and Triple Graph Grammars
Case Study: CD2RDBM Model Transformation
Information Preserving Forward and Backward Transformations
General Theory of Triple Graph Transformations
Triple Graph Transformations as Instantiation of Adhesive HLR Categories
Proof of Theorem 1
Related Work and Conclusion
Activity-Driven Synthesis of State Machines
Introduction
Scenarios
Generating Behaviours from Scenarios
Integrating Behaviours into I/O-Automata
I/O-Automata
Feedback
Translating I/O-Automata into UML 2.0 State Machines
Related Work
Conclusions and Future Work
Flexible and Extensible Notations for ModelingLanguages
Introduction
Mini-Lustre: The Host Language
Mini-Lustre Extensions
Tables
Equals Clauses
State Variables
Events
Scenario Implementations
Discussion
Related Work
Conclusion
Declared Type Generalization Checker:An Eclipse Plug-In for Systematic Programming withMore General Types
The Problem: Too Strong Coupling Due to Overly SpecificTypes
The Solution: The Declared Type Generalization Checker
Generation of Warnings
Provision of Quick Fixes
Algorithms Computing More General Types
Performance Evaluation
Extending the Declared Type Generalization Checker
Availability
References
S2A: A Compiler for Multi-modal UMLSequence Diagrams
Introduction
Overview of S2A
Conclusions and Future Work
Scenario-Driven Dynamic Analysisof Distributed Architectures
Introduction
Related Work
Model-Driven Engineering
Software Architecture
Reconceptualization of ADLs
ADLs as Domain-Specific Modeling Languages
Architectural Analyses as Model Interpreters
The XTEAM Tool-Chain
Composing ADLs and Implementing a Model Interpreter Framework
Domain-Specific Extensions and Architectural Analyses
Discussion
Providing Design Rationale
Weighing Architectural Trade-Offs
Understanding Compositions of Off-the-Shelf Components
Conclusions
References
Enforcing Architecture and DeploymentConstraints of Distributed Component-BasedSoftware
Introduction
Formalizing Architectural Choices During Development
Architectural Choices at Architecture Design Stage
Architectural Choices at Component Design Stage
Architectural Choices at Component Implementation Stage
Resource and Location Requirements at Deployment Stage
Preserving Architectural Choices at Runtime
From Architectural Constraints to Runtime Constraints
Deployment Process: A Centralized Evolution
Deployment Evolution in a Partitioned Network
Implementation Status and Results
Related Work
Conclusion and Future Work
A Family of Distributed Deadlock AvoidanceProtocols and Their Reachable State Spaces
Introduction
Computational Model
A Family of Local Protocols
Allocation Sequences
Reachable State Spaces
Preference Orders
Reachable States
Applications and Conclusions
Precise Specification of Use Case Scenarios
Introduction
Example of Use Case Charts
Use Case Chart Syntax
Abstract Syntax for Scenario Charts (Level-2)
Abstract Syntax for Use Case Charts (Level-1)
Use Case Chart Semantics
Semantics of UML2.0 Interaction Diagrams (Level-3)
Semantics of Scenario Charts (Level-2)
Semantics of Use Case Charts (Level-1)
Related Work
Conclusion
Joint Structural and Temporal Property SpecificationUsing Timed Story Scenario Diagrams
Introduction
Specifying Structural Properties
Specifying Temporal Properties
Specification Pattern System
Deriving Specifications from Textual Requirements
Conclusion and Future Work
SDL Profiles – Formal Semantics and ToolSupport
Introduction
Language Definition of SDL
Specification and Description Language (SDL)
Abstract State Machines
Outline of the Extraction Approach for SDL Profiles
Formalisation
Reduction Profile
Formalisation Signature
Formal Definition of true and false
Formal Reduction of ASM Rules
Consistency of SDL Profiles
SDL-Profile Tool
Tool Chain
Application of the SDL-Profile Tool
Related Work
Conclusions and Outlook
Preliminary Design of BML: A BehavioralInterface Specification Language for JavaBytecode
Introduction
A Short Overview of JML
The Bytecode Modeling Language
Encoding BML Specifications in the Class File Format
Compiling JML Specifications into BML Specifications
Conclusions and Related Work
A Service Composition Constructto Support Iterative Development
Introduction
Related Work
The Approach
The Task-Service Construct
The Graphical Composition Language
Using the Composition for Service Discovery
Using the Composition for Execution
Implementation in SERCS
Discussion
Conclusions and Future Work
Correlation Patterns in Service-OrientedArchitectures
Introduction
Classification Framework
Correlation Mechanisms
Function-Based Correlation
Chained Correlation
Aggregation Functions
Conversation Patterns
Process Instance to Conversation Relationships
Assessment of BPEL 1.1 and BPEL 2.0
Related Work
Conclusion and Outlook
Dynamic Characterization of Web ApplicationInterfaces
Introduction
Methodology
Classifying Responses
Discovering Inferences
Selecting Requests
Empirical Evaluation
Objects of Analysis
Variables and Measures
Design and Setup
Results
Related Work
Conclusion
A Prioritization Approach for SoftwareTest Cases Based on Bayesian Networks
Introduction
Problem Statement
Proposed Approach
Building Bayesian Network
Background: Bayesian Network
Proposed BN Model
Nodes.
Arcs.
CPT.
Experiment
Prioritization Environment
Experiment Setup
Subject Program.
Evaluation Metric.
Prioritization Techniques.
Discussion on Obtained Results
Related Work
Conclusion and The Future Work
Redundancy Based Test-Suite Reduction
Introduction
Preliminaries
Test-Suite Reduction
Model-Checker Based Testing
Test-Suite Redundancy
Identifying Redundancy
Removing Redundancy
Empirical Evaluation
Experiment Setup
Lossy Minimization with Model-Checkers
Results
Conclusion
Testing Scenario-Based Models
Introduction
LSCs and Play-Out Definitions
Execution Configurations
The Testing Environment
Test Recording Methods
Applications and Testing Methodology
Related Work
Integration Testing in Software Product LineEngineering: A Model-Based Technique
Motivation
Related Work
Overview of the Technique
Test Models
Activities
Generation of Integration Test Case Scenarios
Abstraction of Variability (Activity D1)
Generation of Significant Paths (Activity D2)
Generation of the Optimal Path Combination (Activity D3)
Evaluation of the Technique
Design of the Experiment
Validity Threats
Performance of the Technique
Benefit of the Technique
Conclusion and Outlook
References
Practical Reasoning About Invocations andImplementations of Pure Methods
Introduction
Encoding of Pure Methods and Their Return Values
Practical Issues of Method Functions
Well-Founded Definitions of Method Functions
Tension Between Dynamic Execution and Static Verification
The Boogie Methodology
Encoding Lightweight Read-Effects
Preconditions and Frame Conditions for Pure Methods
Consequences of the Standard Precondition
Frame Conditions of Pure Methods
Related Work and Conclusion
Finding Environment Guarantees
Introduction
Background
Environment Guarantees
Environment Guarantees: Modeling and Algorithms
Logics for Open Systems
Representing an Open System as a State-Transition Graph
Checking for Environment Guarantees
Implementation
Case Study: Checking the TCAS II System
Related Work and Discussion
Conclusion and Future Work
Ensuring Consistency Within Distributed GraphTransformation Systems
Introduction
Specifying Distributed Systems with GTS
Architecture of a Distributed System
Structure of a Distributed System
Modeling the Behavior
Execution of Distributed Transformations
Meta-transformations
The Meta-transformation Approach
Examples
Evaluation
Related Work
Conclusion
Maintaining Consistency in LayeredArchitectures of Mobile Ad-Hoc Networks
Introduction
Scenario: Emergency Management
Layered Architectures of Mobile Ad-Hoc Networks
Concepts and Results for Layer Consistency
Consistent Layer Environment
Transformations at Different Layers
Maintaining Consistency
Conclusion
Towards Normal Design for Safety-CriticalSystems
Introduction
Background and Related Work
Problem Oriented Software Engineering
A Problem-Oriented Approach to Safety Analysis
The Case Study
The process
A DC Candidate Architecture
Problem Simplification
Formalising the Requirements
Preliminary Safety Analysis (PSA)
Discussion and Conclusions
A Clustering-Based Approach for TracingObject-Oriented Design to Requirement
Introduction
Approach Description
Representing Use Case and Classes
Building Cluster
Matching Cluster to Use Case
Supplementing Cluster
A Case Study
Experiment Clustering-Based Approach
Comparing Clustering-Based Approach with Non-clustering Approach
Related Work
IR-Based Traceability Identification
Clustering-Based IR Technology
Conclusions and Future Works
References
Measuring and Characterizing Crosscutting inAspect-Based Programs: Basic Metrics andCase Studies
Introduction
Aspect Oriented Programming
Static Crosscuts
Dynamic Crosscuts
Basic Crosscutting Metrics
Abstract Program Structure
Auxiliary Functions
Program Structure Metrics
Feature Crosscutting Metrics
Homogeneous vs. Heterogeneous Features
Case Studies
Program Structure Metrics
Feature Crosscutting Metrics
Collaborations and Heterogeneous Features
Related Work
Conclusions and Future Work
References
Author Index
Recommend Papers

Fundamental Approaches to Software Engineering: 10th International Conference, FASE 2007 Held as Part of the Joint European Conference on Theory and ... (Lecture Notes in Computer Science, 4422)
 3540712887, 9783540712886

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

4422

Matthew B. Dwyer Antónia Lopes (Eds.)

FundamentalApproaches to Software Engineering 10th International Conference, FASE 2007 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2007 Braga, Portugal, March 24 - April 1, 2007 Proceedings

13

Volume Editors Matthew B. Dwyer University of Nebraska Lincoln, NE 68588, USA E-mail: [email protected] Antónia Lopes University of Lisbon 1749–016 Lisboa, Portugal E-mail: [email protected]

Library of Congress Control Number: 2007922338 CR Subject Classification (1998): D.2, F.3, D.3 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13

0302-9743 3-540-71288-7 Springer Berlin Heidelberg New York 978-3-540-71288-6 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2007 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12032075 06/3142 543210

Foreword

ETAPS 2007 is the tenth instance of the European Joint Conferences on Theory and Practice of Software, and thus a cause for celebration. The events that comprise ETAPS address various aspects of the system development process, including specification, design, implementation, analysis and improvement. The languages, methodologies and tools which support these activities are all well within its scope. Different blends of theory and practice are represented, with an inclination towards theory with a practical motivation on the one hand and soundly based practice on the other. Many of the issues involved in software design apply to systems in general, including hardware systems, and the emphasis on software is not intended to be exclusive. History and Prehistory of ETAPS ETAPS as we know it is an annual federated conference that was established in 1998 by combining five conferences [Compiler Construction (CC), European Symposium on Programming (ESOP), Fundamental Approaches to Software Engineering (FASE), Foundations of Software Science and Computation Structures (FOSSACS), Tools and Algorithms for Construction and Analysis of Systems (TACAS)] with satellite events. All five conferences had previously existed in some form and in various colocated combinations: accordingly, the prehistory of ETAPS is complex. FOSSACS was earlier known as the Colloquium on Trees in Algebra and Programming (CAAP), being renamed for inclusion in ETAPS as its historical name no longer reflected its contents. Indeed CAAP’s history goes back a long way; prior to 1981, it was known as the Colleque de Lille sur les Arbres en Algebre et en Programmation. FASE was the indirect successor of a 1985 event known as Colloquium on Software Engineering (CSE), which together with CAAP formed a joint event called TAPSOFT in odd-numbered years. Instances of TAPSOFT, all including CAAP plus at least one software engineering event, took place every two years from 1985 to 1997 inclusive. In the alternate years, CAAP took place separately from TAPSOFT. Meanwhile, ESOP and CC were each taking place every two years from 1986. From 1988, CAAP was colocated with ESOP in even years. In 1994, CC became a “conference” rather than a “workshop” and CAAP, CC and ESOP were thereafter all colocated in even years. TACAS, the youngest of the ETAPS conferences, was founded as an international workshop in 1995; in its first year, it was colocated with TAPSOFT. It took place each year, and became a “conference” when it formed part of ETAPS 1998. It is a telling indication of the importance of tools in the modern field of informatics that TACAS today is the largest of the ETAPS conferences.

VI

Foreword

The coming together of these five conferences was due to the vision of a small group of people who saw the potential of a combined event to be more than the sum of its parts. Under the leadership of Don Sannella, who became the first ETAPS steering committee chair, they included: Andre Arnold, Egidio Astesiano, Hartmut Ehrig, Peter Fritzson, Marie-Claude Gaudel, Tibor Gyimothy, Paul Klint, Kim Guldstrand Larsen, Peter Mosses, Alan Mycroft, Hanne Riis Nielson, Maurice Nivat, Fernando Orejas, Bernhard Steffen, Wolfgang Thomas and (alphabetically last but in fact one of the ringleaders) Reinhard Wilhelm. ETAPS today is a loose confederation in which each event retains its own identity, with a separate programme committee and proceedings. Its format is open-ended, allowing it to grow and evolve as time goes by. Contributed talks and system demonstrations are in synchronized parallel sessions, with invited lectures in plenary sessions. Two of the invited lectures are reserved for “unifying” talks on topics of interest to the whole range of ETAPS attendees. The aim of cramming all this activity into a single one-week meeting is to create a strong magnet for academic and industrial researchers working on topics within its scope, giving them the opportunity to learn about research in related areas, and thereby to foster new and existing links between work in areas that were formerly addressed in separate meetings. ETAPS 1998–2006 The first ETAPS took place in Lisbon in 1998. Subsequently it visited Amsterdam, Berlin, Genova, Grenoble, Warsaw, Barcelona, Edinburgh and Vienna before arriving in Braga this year. During that time it has become established as the major conference in its field, attracting participants and authors from all over the world. The number of submissions has more than doubled, and the numbers of satellite events and attendees have also increased dramatically. ETAPS 2007 ETAPS 2007 comprises five conferences (CC, ESOP, FASE, FOSSACS, TACAS), 18 satellite workshops (ACCAT, AVIS, Bytecode, COCV, FESCA, FinCo, GTVMT, HAV, HFL, LDTA, MBT, MOMPES, OpenCert, QAPL, SC, SLA++P, TERMGRAPH and WITS), three tutorials, and seven invited lectures (not including those that were specific to the satellite events). We received around 630 submissions to the five conferences this year, giving an overall acceptance rate of 25%. To accommodate the unprecedented quantity and quality of submissions, we have four-way parallelism between the main conferences on Wednesday for the first time. Congratulations to all the authors who made it to the final programme! I hope that most of the other authors still found a way of participating in this exciting event and I hope you will continue submitting. ETAPS 2007 was organized by the Departamento de Inform´ atica of the Universidade do Minho, in cooperation with

Foreword

VII

– – – –

European Association for Theoretical Computer Science (EATCS) European Association for Programming Languages and Systems (EAPLS) European Association of Software Science and Technology (EASST) The Computer Science and Technology Center (CCTC, Universidade do Minho) – Camara Municipal de Braga – CeSIUM/GEMCC (Student Groups) The organizing team comprised: – – – – – – – – – –

Jo˜ ao Saraiva (Chair) Jos´e Bacelar Almeida (Web site) Jos´e Jo˜ao Almeida (Publicity) Lu´ıs Soares Barbosa (Satellite Events, Finances) Victor Francisco Fonte (Web site) Pedro Henriques (Local Arrangements) Jos´e Nuno Oliveira (Industrial Liaison) Jorge Sousa Pinto (Publicity) Ant´ onio Nestor Ribeiro (Fundraising) Joost Visser (Satellite Events)

ETAPS 2007 received generous sponsorship from Funda¸c˜ao para a Ciˆencia e a Tecnologia (FCT), Enabler (a Wipro Company), Cisco and TAP Air Portugal. Overall planning for ETAPS conferences is the responsibility of its Steering Committee, whose current membership is: Perdita Stevens (Edinburgh, Chair), Roberto Amadio (Paris), Luciano Baresi (Milan), Sophia Drossopoulou (London), Matt Dwyer (Nebraska), Hartmut Ehrig (Berlin), Jos´e Fiadeiro (Leicester), Chris Hankin (London), Laurie Hendren (McGill), Mike Hinchey (NASA Goddard), Michael Huth (London), Anna Ing´ olfsd´ ottir (Aalborg), Paola Inverardi (L’Aquila), Joost-Pieter Katoen (Aachen), Paul Klint (Amsterdam), Jens Knoop (Vienna), Shriram Krishnamurthi (Brown), Kim Larsen (Aalborg), Tiziana Margaria (G¨ ottingen), Ugo Montanari (Pisa), Rocco de Nicola (Florence), Jakob Rehof (Dortmund), Don Sannella (Edinburgh), Jo˜ ao Saraiva (Minho), Vladimiro Sassone (Southampton), Helmut Seidl (Munich), Daniel Varro (Budapest), Andreas Zeller (Saarbr¨ ucken). I would like to express my sincere gratitude to all of these people and organizations, the programme committee chairs and PC members of the ETAPS conferences, the organizers of the satellite events, the speakers themselves, the many reviewers, and Springer for agreeing to publish the ETAPS proceedings. Finally, I would like to thank the organizing chair of ETAPS 2007, Jo˜ ao Saraiva, for arranging for us to have ETAPS in the ancient city of Braga. Edinburgh, January 2007

Perdita Stevens ETAPS Steering Committee Chair

Preface

Software engineering is a complex enterprise spanning many sub-disciplines. At its core are a set of technical and scientific challenges that must be addressed in order to set the stage for the development, deployment, and application of tools and methodologies in support of the construction of complex software systems. The International Conference on Fundamental Approaches to Software Engineering (FASE) — as one of the European Joint Conferences on Theory and Practice of Software (ETAPS) — focuses on those core challenges. FASE provides the software engineering research community with a forum for presenting well-founded theories, languages, methods, and tools arising from both fundamental research in the academic community and applied work in practical development contexts. In 2007, FASE continued in the strong tradition of FASE 2006 by drawing a large and varied number of submissions from the community — 141 in total. Each submission was reviewed by at least three technical experts from the Program Committee with many papers receiving additional reviews from the broader research community. Each paper was discussed during a 10-day “electronic” meeting. In total, the 26 members of the Program Committee, along with 101 additional reviewers, produced more than 500 reviews. We sincerely thank each them for the effort and care taken in reviewing and discussing the submissions. The Program Committee selected a total of 30 papers — an acceptance rate of 21%. Accepted papers addressed topics including model-driven development, distributed systems, specification, service-oriented systems, testing, software analysis, and design. The technical program was complemented by the invited lectures of Jan Bosch on “Software Product Families: Towards Compositionality” and of Bertrand Meyer on “Contract-Driven Development.” FASE 2007 was held in Braga (Portugal) as part of the tenth meeting of ETAPS — for some history read the Foreword in this volume. While FASE is an integral part of ETAPS, it is important to note the debt FASE owes to ETAPS and its organizers. FASE draws significant energy from its synergistic relationships with the other ETAPS meetings, which gives it a special place in the software engineering community. Perdita Stevens and the rest of the ETAPS Steering Committee have provided extremely helpful guidance to us in organizing FASE 2007 and we thank them. Jo˜ ao Saraiva and his staff did a wonderful job as local organizers and as PC chairs we appreciate how smoothly the meeting ran due to their efforts. In closing, we would like to thank the authors of all of the FASE submissions and the attendees of FASE sessions for their participation and we look forward to seeing you in Budapest for FASE 2008.

January 2007

Matthew B. Dwyer Ant´ onia Lopes

Organization

Program Committee Luciano Baresi (Politecnico di Milano, Italy) Yolanda Berbers (Katholieke Universiteit Leuven, Belgium) Carlos Canal (University of M´ alaga, Spain) Myra Cohen (University of Nebraska, USA) Ivica Crnkovic (M¨ alardalen University, Sweden) Arie van Deursen (Delft University of Technology, The Netherlands) Juergen Dingel (Queen’s University, Canada) Matt Dwyer (University of Nebraska, USA) Co-chair Harald Gall (University of Zurich, Switzerland) Holger Giese (University of Paderborn, Germany) Martin Grosse-Rhode (Fraunhofer-ISST, Germany) Anthony Hall (Independent Consultant, UK) Reiko Heckel (University of Leicester, UK) Patrick Heymans (University of Namur, Belgium) Paola Inverardi (Universidad of L’Aquila, Italy) Val´erie Issarny (INRIA-Rocquencourt, France) Natalia Juristo (Universidad Politecnica de Madrid, Spain) Kai Koskimies (Tampere University of Technology, Finland) Patricia Lago (Vrije Universiteit, The Netherlands) Ant´ onia Lopes (University of Lisbon, Portugal) Co-chair Mieke Massink (CNR-ISTI, Italy) Carlo Montangero (University of Pisa, Italy) Barbara Paech (University of Heidelberg, Germany) Leila Ribeiro (Federal University of Rio Grande do Sul, Brazil) Robby (Kansas State University, USA) Catalin Roman (Washington University, USA) Sebastian Uchitel (Imperial College, UK and University of Buenos Aires, Argentina) Jianjun Zhao (Shanghai Jiao Tong University, China)

Referees M. Aiguier M. Akerholm V. Ambriola J. Andersson P. Asirelli M. Autili

A. Bazzan D. Bisztray T. Bolognesi Y. Bontemps J. Bradbury A. Brogi

A. Bucchiarone S. Bygde D. Carrizo G. Cignoni V. Clerc A. Corradini

XII

Organization

M. Caporuscio R. Coreeia S. Costa M. Crane C. Cuesta O. Dieste D. Di Ruscio G. De Angelis A. de Antonio R.C. de Boer F. Dotti F. Dur´ an K. Ehrig M.V. Espada A. Fantechi R. Farenhorts M.L. Fernandez X. Ferre L. Foss M. Fischer B. Fluri M. Fredj J. Fredriksson S. Gnesi Q. Gu R. Hedayati S. Henkler M. Hirsch

M. Katara J.P. Katoen F. Klein P. Knab P. Kosiuczenko S. Larsson D. Latella B. Lisper M. Loreti Y. Lu F. L¨ uders R. Machado G. Mainetto S. Mann E. Marchetti C. Matos S. Meier ´ Moreira A. A.M. Moreno H. Muccini J.M. Murillo J. Niere J. Oberleitner A.G. Padua H. Pei-Breivold P. Pelliccione A. Pierantonio E. Pimentel

M. Pinto M. Pinzger P. Poizat S. Punnekkat G. Reif G. Sala¨ un A.M. Schettini P.Y. Schobbens M.I.S. Segura P. Selonen L. Semini S. Sentilles M. Solari T. Systa M. ter Beek G. Thompson M. Tichy M. Tivoli E. Tuosto F. Turini A. Vallecillo S. Vegas A. Vilgarakis R. Wagner H.Q. Yu A. Zarras A. Zuendorf

Table of Contents

Invited Contributions Software Product Families: Towards Compositionality . . . . . . . . . . . . . . . . Jan Bosch

1

Contract-Driven Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bertrand Meyer

11

Evolution and Agents EQ-Mine: Predicting Short-Term Defects for Software Evolution . . . . . . . Jacek Ratzinger, Martin Pinzger, and Harald Gall

12

An Approach to Software Evolution Based on Semantic Change . . . . . . . . Romain Robbes, Michele Lanza, and Mircea Lungu

27

A Simulation-Oriented Formalization for a Psychological Theory . . . . . . . Paulo Salem da Silva and Ana C. Vieira de Melo

42

Model Driven Development Integrating Performance and Reliability Analysis in a Non-Functional MDA Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vittorio Cortellessa, Antinisca Di Marco, and Paola Inverardi

57

Information Preserving Bidirectional Model Transformations . . . . . . . . . . Hartmut Ehrig, Karsten Ehrig, Claudia Ermel, Frank Hermann, and Gabriele Taentzer

72

Activity-Driven Synthesis of State Machines . . . . . . . . . . . . . . . . . . . . . . . . . Rolf Hennicker and Alexander Knapp

87

Flexible and Extensible Notations for Modeling Languages . . . . . . . . . . . . Jimin Gao, Mats Heimdahl, and Eric Van Wyk

102

Tool Demonstrations Declared Type Generalization Checker: An Eclipse Plug-In for Systematic Programming with More General Types . . . . . . . . . . . . . . . . . . Markus Bach, Florian Forster, and Friedrich Steimann

117

S2A: A Compiler for Multi-modal UML Sequence Diagrams . . . . . . . . . . . David Harel, Asaf Kleinbort, and Shahar Maoz

121

XIV

Table of Contents

Distributed Systems Scenario-Driven Dynamic Analysis of Distributed Architectures . . . . . . . . George Edwards, Sam Malek, and Nenad Medvidovic

125

Enforcing Architecture and Deployment Constraints of Distributed Component-Based Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chouki Tibermacine, Didier Hoareau, and Reda Kadri

140

A Family of Distributed Deadlock Avoidance Protocols and Their Reachable State Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C´esar S´ anchez, Henny B. Sipma, and Zohar Manna

155

Specification Precise Specification of Use Case Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . Jon Whittle

170

Joint Structural and Temporal Property Specification Using Timed Story Scenario Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florian Klein and Holger Giese

185

SDL Profiles – Formal Semantics and Tool Support . . . . . . . . . . . . . . . . . . R. Grammes and R. Gotzhein

200

Preliminary Design of BML: A Behavioral Interface Specification Language for Java Bytecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lilian Burdy, Marieke Huisman, and Mariela Pavlova

215

Services A Service Composition Construct to Support Iterative Development . . . . Roy Grønmo, Michael C. Jaeger, and Andreas Wombacher

230

Correlation Patterns in Service-Oriented Architectures . . . . . . . . . . . . . . . . Alistair Barros, Gero Decker, Marlon Dumas, and Franz Weber

245

Dynamic Characterization of Web Application Interfaces . . . . . . . . . . . . . . Marc Fisher II, Sebastian Elbaum, and Gregg Rothermel

260

Testing A Prioritization Approach for Software Test Cases Based on Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Siavash Mirarab and Ladan Tahvildari

276

Redundancy Based Test-Suite Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gordon Fraser and Franz Wotawa

291

Table of Contents

XV

Testing Scenario-Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hillel Kugler, Michael J. Stern, and E. Jane Albert Hubbard

306

Integration Testing in Software Product Line Engineering: A Model-Based Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sacha Reis, Andreas Metzger, and Klaus Pohl

321

Analysis Practical Reasoning About Invocations and Implementations of Pure Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ am Darvas and K. Rustan M. Leino Ad´

336

Finding Environment Guarantees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marsha Chechik, Mihaela Gheorghiu, and Arie Gurfinkel

352

Ensuring Consistency Within Distributed Graph Transformation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ulrike Ranger and Thorsten Hermes

368

Maintaining Consistency in Layered Architectures of Mobile Ad-Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julia Padberg, Kathrin Hoffmann, Hartmut Ehrig, Tony Modica, Enrico Biermann, and Claudia Ermel

383

Design Towards Normal Design for Safety-Critical Systems . . . . . . . . . . . . . . . . . . Derek Mannering, Jon G. Hall, and Lucia Rapanotti

398

A Clustering-Based Approach for Tracing Object-Oriented Design to Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xin Zhou and Hui Yu

412

Measuring and Characterizing Crosscutting in Aspect-Based Programs: Basic Metrics and Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberto E. Lopez-Herrejon and Sven Apel

423

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

439

Software Product Families: Towards Compositionality Jan Bosch Nokia, Technology Platforms/Software Platforms, P.O. Box 407, FI-00045 NOKIA GROUP, Finland [email protected] http://www.janbosch.com

Abstract. Software product families have become the most successful approach to intra-organizational reuse. Especially in the embedded systems industry, but also elsewhere, companies are building rich and diverse product portfolios based on software platforms that capture the commonality between products while allowing for their differences. Software product families, however, easily become victims of their own success in that, once successful, there is a tendency to increase the scope of the product family by incorporating a broader and more diverse product portfolio. This requires organizations to change their approach to product families from relying on a pre-integrated platform for product derivation to a compositional approach where platform components are composed in a product-specific configuration. Keywords: Software product families, compositionality.

1 Introduction Over the last decades, embedded systems have emerged as one of the key areas of innovation in software engineering. The increasing complexity, connectedness, feature density and enriched user interaction, when combined, have driven an enormous demand for software. In fact, the size of software in embedded systems seems to follow Moore’s law, i.e. with the increased capabilities of the hardware, the software has followed suit in terms of size and complexity. This has lead to a constant struggle to build the software of embedded systems in a cost-effective, rapid and high-quality fashion in the face of a constantly expanding set of requirements. Two of the key approaches evolved to handle this complexity have been software architecture and software product families. Together, these technologies have allowed companies to master, at least in part, the complexity of large scale software systems. One can identify three main trends that are driving the embedded systems industry, i.e. convergence, end-to-end functionality and software engineering capability. The convergence of the consumer electronics, telecom and IT industries has been discussed for over a decade. Although many may wonder whether and when it will happen, the fact is that the convergence is taking place constantly. Different from what the name may suggest, though, convergence in fact leads to a portfolio of increasingly diverging devices. For instance, in the mobile telecom industry, mobile phones have diverged into still picture camera models, video camera models, music M.B. Dwyer and A. Lopes (Eds.): FASE 2007, LNCS 4422, pp. 1 – 10, 2007. © Springer-Verlag Berlin Heidelberg 2007

2

J. Bosch

player models, mobile TV models, mobile email models, etc. This trend results in a significant pressure on software product families as the amount of variation to be supported by the platform in terms of price points, form factors and feature sets is significantly beyond the requirements just a few years ago. The second trend is that many innovations that have proven their success in the market place require the creation of an end-to-end solution and possibly even the creation or adaptation of a business eco-system. Examples from the mobile domain include, for instance, ring tones, but the ecosystem initiated by Apple around digital music is exemplary in this context. The consequence for most companies is that where earlier, they were able to drive innovations independently to the market, the current mode requires significant partnering and orchestration for innovations to be successful. The third main trend is that a company’s ability to engineer software is rapidly becoming a key competitive differentiator. The two main developments underlying this trend are efficiency and responsiveness. With the constant increase in software demands, the cost of software R&D is becoming unacceptable from a business perspective. Thus, some factor difference in productivity is easily turning into being able or not being able to deliver certain feature sets. Responsiveness is growing in importance because innovation cycles are moving increasingly fast and customers are expecting constant improvements in the available functionality. Web 2.0 [7] presents a strong example of this trend. A further consequence for embedded systems is that, in the foreseeable future, the hardware and software innovation cycles will, at least in part, be decoupled, significantly increasing demands for post-deployment distribution of software. Due to the convergence trend, the number of different embedded products that a manufacturer aims to bring to market is increasing. Consequently, reuse of software (as well as of mechanical and hardware solutions) is a standing ambition for the industry. The typical approach employed in the embedded systems industry is to build a platform that implements the functionality common to all devices. The platform is subsequently used as a basis when creating new product and functionality specific to the product is built on top of the platform. Several embedded system companies have successfully employed product families or platforms and are now reaching the stage where the scope of the product family is expanding considerably. This requires a transition from a traditional, integration-oriented approach to a compositional approach. The contribution of this paper is that it analyses the problems of traditional approaches to software product families that several companies are now confronted with. In addition, it presents compositional platforms as the key solution approach to addressing these problems and discusses the technical and organizational consequences. The remainder of this article is organized as follows. The next section defines the challenges faced by traditional product families when expanding their scope. Subsequently, section 3 presents the notion of compositional product families. The component model underlying composability is discussed in more detail in section 4. Finally, the paper is concluded in section 5.

Software Product Families: Towards Compositionality

3

2 Problem Statement This paper discusses and presents the challenges of the traditional, integrationoriented approach to software product families [1] when the scope of the family is extended. However, before we can discuss this, we need to first define integrationoriented platform approach more precisely. In most cases, the platform approach is organized using a strict separation between the platform organization and the product organizations. The platform organization has typically a periodic release cycle where the complete platform is released in a fully integrated and tested fashion. The product organizations use the platform as a basis for creating and evolving theirs product by extending the platform with product-specific features. The platform organization is divided in a number of teams, in the best case mirroring the architecture of the platform. Each team develops and evolves the component (or set of related components) that it is responsible for and delivers the result for integration in the platform. Although many organizations have moved to applying a continuous integration process where components are constantly integrated during development, in practice significant verification and validation work is performed in the period before the release of the platform and many critical errors are only found in that stage. The platform organization delivers the platform as a large, integrated and tested software system with an API that can be used by the product teams to derive their products from. As platforms bring together a large collection of features and qualities, the release frequency of the platform is often relatively low compared to the frequency of product programs. Consequently, the platform organization often is under significant pressure to deliver as many new features and qualities during the release. Hence, there is a tendency to short-cut processes, especially quality assurance processes. Especially during the period leading up to a major platform release, all validation and verification is often transferred to the integration team. As the components lose quality and integration team is confronted with both integration problems and component-level problems, in the worst case an interesting cycle appears where errors are identified by testing staff that has no understanding of the system architecture and can consequently only identify symptoms, component teams receive error reports that turn out to originate from other parts in the system and the integration team has to manage highly conflicting messages from the testing and development staff, leading to new error reports, new versions of components that do not solve problems, etc. In figure 1, the approach is presented graphically. The platform consists of a set of components that are integrated, tested and released for product derivation. A product derivation project receives the pre-integrated platform, may change something to the platform architecture but mostly develops product-specific functionality on top of the platform. Although several software engineering challenges associated with software platforms have been outlined, the approach often proves highly successful in terms of maximizing R&D efficiency and cost-effectively offering a rich product portfolio. Thus, in its initial scope, the integration-oriented platform approach has often proven itself as a success. However, the success can easily turn into a failure when the organization decides to build on the success of the initial software platform and significantly broadens the scope of the product family. The broadening of the scope

4

J. Bosch

can be the result of the company deciding to bring more existing product categories under the platform umbrella or because it decides to diversify its product portfolio as the cost of creating new products has decreased considerably. At this stage, we have identified in a number of companies that broadening the scope of the software product family without adjusting the mode of operation quite fundamentally leads to a number of key concerns and problems that are logical and unavoidable. However, because of the earlier success that the organization has experienced, the problems are insufficiently identified as fundamental, but rather as execution challenges, and fundamental changes to the mode of operation are not made until the company experiences significant financial consequences.

platform

product

Fig. 1. Integration-oriented approach

The problems and their underlying causes that one may observe when the scope of a product family is broadened considerably over time include, among others, those described below: 1.

2.

3.

Decreasing complete commonality: Before broadening the scope of the product family, the platform formed the common core of product functionality. However, with the increasing scope, the products are increasingly diverse in their requirements and amount of functionality that is required for all products is decreasing, in either absolute or relative terms. Consequently, the (relative) number of components that is shared by all products is decreasing, reducing the relevance of the common platform. Increasing partial commonality: Functionality that is shared by some or many products, though not by all, is increasingly significantly with the increasing scope. Consequently, the (relative) number of components that is shared by some or most products is increasing. The typical approach to this model is the adoption of hierarchical product families. In this case, business groups or teams responsible for certain product categories build a platform on top of the company wide platform. Although this alleviates part of the problem, it does not provide an effective mechanism to share components between business groups or teams developing products in different product categories. Over-engineered architecture: With the increasing scope of the product family, the set of business and technical qualities that needs to be supported by the common platform is broadening as well. Although no product needs support for all qualities, the architecture of the platform is required to do so and, consequently, needs to be over-engineered to satisfy the needs of all products and product categories.

Software Product Families: Towards Compositionality

4.

5.

6.

5

Cross–cutting features: Especially in embedded systems, new features frequently fail to respect the boundaries of the platform. Whereas the typical approach is that differentiating features are implemented in the product (category) specific code, often these features require changes in the common components as well. Depending on the domain in which the organization develops products, the notion of a platform capturing the common functionality between all products may easily turn into an illusion as the scope of the product family increases. Maturity of product categories: Different product categories developed by one organization frequently are in different phases of the lifecycle. The challenge is that, depending on the maturity of a product category, the requirements on the common platform are quite different. For instance, for mature product categories cost and reliability are typically the most important whereas for product categories early in the maturity phase feature richness and time-to-market are the most important drivers. A common platform has to satisfy the requirements of all product categories, which easily leads to tensions between the platform organization and the product categories. Unresponsiveness of platform: Especially for product categories early in the maturation cycle, the slow release cycle of software platforms is particularly frustrating. Often, a new feature is required rapidly in a new product. However, the feature requires changes in some platform components. As the platform has a slow release cycle, the platform is typically unable to respond to the request of the product team. The product team is willing to implement this functionality itself, but the platform team is often not allowing this because of the potential consequences for the quality of the product team.

3 Towards Compositionality Although software product families have proven their worth, as discussed above, there are several challenges to be faced when the product family approach is applied to an increasingly broad and diverse product portfolio. The most promising direction, as outlined in this paper, is towards a more compositional approach to product creation. One of the reasons for this is that in the integration-oriented approach all additions and changes to the platform components typically are released as part of an integrated platform release. This requires, first, all additions and changes for all components to be synchronized for a specific, typically large and complex, release and, second, easily causes cross-component errors as small glitches in alignment between evolving components cause integration errors. The compositional approach aims to address these issues through the basic principle of independent deployment [6]. This principle is almost as old as the field of software engineering itself, but is violated in many software engineering efforts. Independent deployment states that a component, during evolution, always has to maintain “replaceability” with older versions. This principle is relatively easy to implement for the provided interfaces of a component, as it basically requires the

6

J. Bosch

component to just continue to offer backward compatibility. The principle however also applies to the required interfaces of a component. This is more complicated as this requires components to intelligently degrade their functionality when the required interfaces are bound to components that do not provide functionality required for new features. Thus, although the principle is easy to understand in abstract terms, the implementation often is more complicated, leading to situations where an R&D organization may easily abandon the principle. If the principle of independent deployment is, however, adhered to, then a very powerful compositional model in the context of software product families is created: rather than requiring the evolution of each component or subsystem to be perfectly aligned, in this approach each component or subsystem can evolve separately. Because each component guarantees backward compatibility and supports intelligent degrading of provided functionality based on the composition in which the component is used, it facilitates a “continuous releasing” model, allowing new functionality to be available immediately to product derivation projects. In addition, quality issues can, to a much larger extent, be dealt with locally in individual components, rather than as part of the integration. Although the approach described in this section has significant advantages for traditional product families, the broadening product scope of many families creates an increasing need for creating creative configurations [3]. Some typical reasons for creative configurations include: •

• •



Structural divergence: As discussed earlier, the convergence trend is actually causing a divergence in product requirements. Components and subsystems need to be composed in alternative configurations because of product requirements that are deviating significantly from the standard product. Functional divergence: A second cause for requiring a creative configuration is where platform components need to be replaced with product specific components to allow for diverging product functionality. Temporal divergence: In some cases, the divergence between product requirements may be temporal, i.e. certain products require functionality significantly earlier than the main, high volume product segment for which the platform is targeted. Although every product family has leading, typically high-end, products feeding the rest of the product portfolio with new functionality, in this case the temporal divergence is much more significant than in those cases. This may, among others, be due to the need to create niche products or because of the need to respond more rapidly to changing market forces to an extent unable to be accounted for by typically slow platform development. Quality divergence: Finally, a fourth source of divergence is where specific quality attributes, e.g. security or reliability, require the insertion of behaviour between platform components in order to achieve certain quality requirements. Although the structure of the original platform architecture may be largely maintained, the connections between the components are replaced with behavioural modules that insert and coordinate functionality.

Software Product Families: Towards Compositionality

7

Architectural guidelines guarantee composability Components/subsystems guarantee quality

Fig. 2. Compositional approach to software product families

In figure 2, the compositional approach is presented graphically. The main items to highlight include the creative product configurations shown on the right side and the fact that there are two evolutionary flows, i.e. from the platform components towards the products and visa versa. In the paper so far, we have provided a general overview of the compositional approach to software platforms. However, this approach has bearing on many topics related to software product families. Below, we discuss a few of these. Software variability management: In the research area of software product families, software variability management (SVM) is an important field of study. One may easily argue that the topics addressed in this paper can be addressed by employing appropriate variability mechanisms. In our experience, SVM is complementary to employing a compositional approach as the components still need to offer variation points and associated variants. In [5] we argue that SVM focuses primarily on varying behaviour in the context of stable architecture, whereas compositionality is primarily concerned with viewing the elements stable and the configurations in which the elements are combined to be the part that varies. In practice, however, both mechanisms are necessary when the scope of a product family extends beyond certain limits. Software architecture: In most definitions of software architecture, the predominant focus is on the structure of the architecture, i.e. the boxes and lines. In some definitions, there is mention of the architectural principles guiding development and evolution [5], but few expand on this notion. In the context of compositional product families, the structural aspect of software architecture is become increasingly uninteresting from a design perspective, as the structure of the architecture will be different for each derived product and may even change during operation. Consequently, with the overall increase of dynamism in software systems, software architecture is more and more about the architectural principles. In [2], we argue that architectural principles can be categorized into architecture rules, architecture constraints and the associated rationale. Software configuration management (SCM): At each stage of evolving an existing component, there is a decision to version or to branch. Versioning requires that the resulting component either contains a superset of the original and additional functionality or introduce a variation point that allows the functionality provided by the component to be configured at some point during the product derivation lifecycle. Branching creates an additional parallel version of the component that requires a

8

J. Bosch

selection during the product derivation. Although branching has its place in engineering complex software product families, it has disadvantages with respect to managing continued updates and bug fixes. It easily happens that, once branched, a component branch starts to diverge to the point that the product originally requiring the branching lacks too many features in the component and abandons it.

4 Component Model for Compositional Platforms The Holy Grail in the software reuse research community has, for the last four decades, been that components not developed for integration with each other can be composed and result in the best possible composed functionality. In practice, this has proven to be surprisingly difficult, among others because components often have expectations on their context of use. In the context of the integration-oriented approach, we see that components typically have more expectations on components both providing and requiring functionality and that these expectations, paradoxically, that are less precisely and explicitly defined. In contrast, composition-oriented components use only explicitly defined dependencies and contain intelligence to handle partially met binding of interfaces. For the software assets making up a product family, at least the components and subsystems need to satisfy a number of requirements in order facilitate composability. Different aspects of these requirements as well as additional requirements have been identified by other researchers as well. •





Interface completeness: The composition of components and subsystems should only require the information specified in the provided, required and configuration interfaces. Depending on the type of product family, compiletime, link-time, installation-time and/or run-time composition of provided interfaces and required interfaces should be facilitated and the composition should lead to systems providing the best possible functionality given the composition. Intelligent degradation: Components should be constructed such that partial binding of the required interfaces results in automatic, intelligent degradation of the functionality offered through the provided interfaces of the component. In reality, this can not be achieved for all required interfaces, so for most components the required interfaces can be classified as core (must be bound) and non-core (can be bound). This is mirrored in the provided interfaces that degrade their functionality accordingly. In practice, most non-core interfaces represent steps in the evolution of the component or subsystem. Variability management: Non-core interfaces and configurable internal behaviour are part of the overall variability offered by a component or subsystem and needs to be accessible to the users of the component through a specific configuration or variability interface.

One of the general trends in software engineering is later binding or, in general, delaying decisions to the latest point in the software lifecycle that is still acceptable from an economic perspective. Also for embedded systems, an increasing amount of

Software Product Families: Towards Compositionality

9

configuration and functionality extension can take place after the initial deployment. However, for post-deployment composability to be feasible, again the software assets that are part of the product family need to satisfy some additional requirements. • •

• •

Two descriptions: A component requires an operational description of its behaviour (code) as well as an inspectable model of its intended behaviour. Monitoring required interfaces: For each required interface, a component has an inspectable model of the behaviour required from a component bound to the interface. This allows a component to monitor its providing components. Self-monitoring: In addition to monitoring its providing components, a component observes its own behaviour and identifies mismatches between specified and actual behaviour. Reactive adjustment: A component can initiate corrective actions for a subset of mismatches between required and actual behaviour of itself or of its providing components and is able to report other mismatches to the encompassing component/subsystem.

Concluding, although some of the techniques described in this section require more advanced solutions provided by the development environment, by and large the compositional approach can be implemented using normal software development tools and environments. The main transformation for most organizations is mostly concerned with organizational and cultural changes.

5 Conclusions This paper discusses and presents the challenges of the traditional, integrationoriented approach to software product families when the scope of the family is extended. These problems include the decreasing complete commonality, increasing partial commonality, the need to over-engineer the platform architecture, cross– cutting features, different maturity of product categories and, consequently, increasing unresponsiveness of the platform. As a solution to addressing these concerns we present the compositional platform approach. This approach becomes necessary when the traditional integration-oriented approach needs to be stretched beyond its original boundaries. We have identified at least four types of divergence, i.e. structural divergence, functional divergence, temporal divergence and quality divergence. The compositional platform approach is based on the principle of independent deployment [6]. This principle defines rules that components need to satisfy in order to provide backward compatibility and flexibly in addressing partial binding of required interfaces. In particular, three aspects are necessary but not sufficient requirements: interface completeness, intelligent degradation and variability management. Although many product families implement or support a small slice of the principles and mechanisms, few examples exist that support a fully compositional platform approach. In that sense this paper should be considered as visionary rather than actual. However, the problems and challenges of the integration-oriented approach are real and as a community, we need to develop solutions that can be adopted by the software engineering industry.

10

J. Bosch

References 1. J. Bosch, Design and Use of Software Architectures: Adopting and Evolving a Product Line Approach, Pearson Education (Addison-Wesley & ACM Press), ISBN 0-201-67494-7, May 2000. 2. Jan Bosch, Software Architecture: The Next Step, Proceedings of the First European Workshop on Software Architecture (EWSA 2004), Springer LNCS, May 2004. 3. Sybren Deelstra, Marco Sinnema and Jan Bosch, Product Derivation in Software Product Families: A Case Study, Journal of Systems and Software, Volume 74, Issue 2, pp. 173-194, 15 January 2005. 4. www.softwarearchitectureportal.org 5. R. van Ommering, J. Bosch, Widening the Scope of Software Product Lines - From Variation to Composition, Proceedings of the Second Software Product Line Conference (SPLC2), pp. 328-347, August 2002. 6. R. van Ommering, Building product populations with software components, Proceedings of the 24th International Conference on Software Engineering, pp. 255 – 265, 2002. 7. http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

Contract-Driven Development Bertrand Meyer (Joint work with Andreas Leitner) E.T.H. Zurich, Switzerland [email protected]

Abstract. In spite of cultural difference between the corresponding scientific communities, recognition is growing that test-based and specification-based approaches to software development actually complement each other. The revival of interest in testing tools and techniques follows in particular from the popularity of ”Test-Driven Development”; rigorous specification and proofs have, for their part, also made considerable progress. There remains, however, a fundamental superiority of specifications over test: you can derive tests from a specification, but not the other way around. Contract-Driven Development is a new approach to systematic software construction combining ideas from Design by Contract, from TestDriven Development, from work on formal methods, and from advances in automatic testing as illustrated for example in our AutoTest tool. Like TDD it gives tests a central role in the development process, but these tests are deduced from possibly partial specifications (contracts) and directly supported by the development environment. This talk will explain the concepts and demonstrate their application.

M.B. Dwyer and A. Lopes (Eds.): FASE 2007, LNCS 4422, p. 11, 2007. c Springer-Verlag Berlin Heidelberg 2007 

EQ-Mine: Predicting Short-Term Defects for Software Evolution Jacek Ratzinger1 , Martin Pinzger2 , and Harald Gall2 1

2

Distributed Systems Group Vienna University of Technology, Austria [email protected] s.e.a.l. – software evolution and architecture lab University of Zurich, Switzerland {pinzger,gall}@ifi.unizh.ch

Abstract. We use 63 features extracted from sources such as versioning and issue tracking systems to predict defects in short time frames of two months. Our multivariate approach covers aspects of software projects such as size, team structure, process orientation, complexity of existing solution, difficulty of problem, coupling aspects, time constrains, and testing data. We investigate the predictability of several severities of defects in software projects. Are defects with high severity difficult to predict? Are prediction models for defects that are discovered by internal staff similar to models for defects reported from the field? We present both an exact numerical prediction of future defect numbers based on regression models as well as a classification of software components as defect-prone based on the C4.5 decision tree. We create models to accurately predict short-term defects in a study of 5 applications composed of more than 8.000 classes and 700.000 lines of code. The model quality is assessed based on 10-fold cross validation. Keywords: Software Evolution, Defect Density, Quality Prediction, Machine Learning, Regression, Classification.

1

Introduction

We want to improve the evolvability of software by providing prediction models to assess quality as soon as possible in the product life cycle. When software systems evolve we need to measure the outcome of the systems before shipping them to customers. Software management systems such as the concurrent versioning system (CVS) and issue tracking systems (Jira) capture data about the evolution of the software during development. Our approach, EQ-Mine uses this data to compute a number of features, which are computed for source file revisions in the pre- and post-release phases. Based on these evolution measures we then set up a prediction model. To evaluate the defect density prediction capabilities of our evolution measures we apply three data mining algorithms and test 5 specified hypotheses. Results M.B. Dwyer and A. Lopes (Eds.): FASE 2007, LNCS 4422, pp. 12–26, 2007. c Springer-Verlag Berlin Heidelberg 2007 

EQ-Mine: Predicting Short-Term Defects for Software Evolution

13

clearly underline that defect prediction models have to take into account different aspects and measures of the software development and maintenance [1]. In extension to our previous work on predicting defect density of source files [2] we use detailed evolution data from an industrial software project and include team structure and process measures. The remaining paper is structured as follows. It starts with the formulation of our research hypotheses (section 2). Related work is discussed in section 3. In section 4 we describe the evolution measures used to build our defect prediction model. Our approach is evaluated on a case study in section 5. We finalize this paper with our conclusions and intent for future work in section 6.

2

Hypotheses

To guide the metrics selection for defect prediction and our evaluation with a case study, we set up several hypotheses. In contrast to previous research approaches (e.g. [3]) EQ-Mine aims at a fine-grained level. Our hypothesis are used to focus on different aspects of our fine grained analysis such as the severity of defects, the timing of predictions around releases, and the type of defect discovered (internal vs. external): – H1: Defect density can be predicted based on a short time-frame. Previous research focused on the prediction of longer time-frames such as releases [4, 5]. In our research we focus on months as time scale and use two months of development time to predict defect densities for the following two months. – H2: Critical defects with high severity have a low regularity. Prediction models build on the regularity of the underlying data and can predict events better that correspond to this regularity. We expect that defects that are critical are more difficult to detect as they ”hide better” during the testing and product delivery. – H3: Quality predictions before a release are more accurate than after a release. Project quality can be estimated in different stages of the development process. Some stages are more difficult to assess than others. Previous studies already indicated that the accuracy of data mining in software engineering varies over time (e.g. [5]). We expect that defects that are detected before a release date are easier to predict than defects that are reported afterwards. – H4: Defects discovered by internal staff have more regularity than defects reported by the customer. For prediction model creation it is an important input to know where the defect comes from. Was it recognized by the internal staff (e.g. during testing) or does the defect report come from customer sites? We expect that internally and externally detected defects have different characteristics. As a result one group can be easier predicted than the other one. – H5: Different aspects of software evolution have to be regarded for an accurate defect prediction. We use a large amount of evolution indicators for defect prediction. These indicators can be grouped into several categories such as

14

J. Ratzinger, M. Pinzger, and H. Gall

size and complexity measures, indicators for the complexity of the existing solution and team related issues. For defect prediction we expect that data mining features from many different categories are important.

3

Related Work

Many organizations want to predict software quality before their systems are used. Fenton and Neil provide a critical review of literature that describes several software metrics and a wide range of prediction models [1]. They found out that most of the statistical models are based on size and complexity metrics with the aim to predict the number of defects in a system. Others are based on testing results, the testing process, the ”quality” of the development process, or take a multivariate approach. There are various techniques to identify critical code pieces. The most common one is to define typical bug patterns that are derived from experience and published common pitfalls in a certain programming language. Wagner et al. [6] analyzed several industrial and development projects with the help of bug detection tools as well as with other types of defect-detection techniques. Khoshgoftaar et al. [7] use software metrics as input to classification trees to predict fault-prone modules. One release provides the training dataset and the subsequent release is used for evaluation purpose. They claim that the resulting model achieved useful accuracy in spite of the very small proportion of faultprone modules in the system. Classification trees generate partition trees based on a training data set describing known experiences of interest (e.g. characteristics of the software). The tree structure is intuitive and can be easily interpreted. Briand et al. [8] try to improve the predictive capabilities by combining the expressiveness of classification trees with the rigor of a statistical basis. Their approach called OSR generates a set of patterns relevant to the predicted object estimated based on the entropy H. There are different reasons for each fault: Some faults exist because of errors in the specification of requirements. Others are directly attributable to errors committed in the design process. Finally, there are errors that are introduced directly into the source. Nikora and Munson developed a standard for the enumeration of faults based on the structural characteristics of the MDS software system [9]. Changes to the system are visible at the module level (i.e. procedures and functions) and therefore this level of granularity is measured. This fault measurement process was then applied to a software system’s structural evolution during its development. Every change to the software system was measured and every fault was identified and tracked to a specific line of code. The rate of change in program modules should serve as a good index of the rate of fault introduction. In a study the application of machine learning (inductive) technique was tested for the software maintenance process. Shirabad et al. [10] present an example of an artificial intelligence method that can be used in future maintenance activities. An induction algorithm is applied to a set of pre-classified training examples of the concept we want to learn. The large size

EQ-Mine: Predicting Short-Term Defects for Software Evolution

15

and complexity of systems, high staff turnover, poor documentation and the long periods of time these systems must be maintained leads to a lack of knowledge in how to proceed the maintenance of software systems. Only a small number of empirical studies using industrial software systems are performed and published. Ostrand and Weyuker, for example, evaluated a large inventory tracking system at AT&T [4]. They analyzed how faults are distributed over different releases. They discovered that faults are always heavily concentrated in a relatively small number of releases during the entire life cycle. Additionally the number of faults is getting higher as the product matures and high-fault modules tend to remain high fault in later releases. So it would be worthwhile to concentrate fault detection on a relatively small number of high fault-prone releases, if they can be identified early.

4

Data Measures

To mine software development projects we use the data obtained from versioning system (CVS) and issue tracking system (Jira). CVS enable the handling of different versions of files in cooperating teams. This tool logs every change event, which provides the necessary information about the history of a software system. The log-information for our mining approach—pure textual, human readable information—is retrieved via standard command line tools, parsed and stored in the release history database [11]. Jira manages data about project issues such as bug reports or feature requests. This system give a historical overview of the requirements and their implementations. We extract the data based on its backup facility, where the entire issue data can be exported into XML files. These files are processed to import the information into our database. In a post-processing step we link issues from Jira to log information from CVS using the comments of developers in commit messages by searching for issue numbers. In addition we distinguish between issues created by developers and issues created by customers by linking issue reporters to CVS authors. Issues are counted as reported by internal staff when the issue reporter can be linked to a CVS author otherwise the issue is defined to be external (e.g. hotline). 4.1

Features

From the linked data in the release history database we compute 63 evolution measures that are considered as features for data mining. These features are gathered on file basis, where data from all revisions of a file within a predefined time period is summarized. To build a balanced prediction model we create features to represent several important aspects of software development such as the complexity of the designed solution, process used for development, interrelation of classes, etc. As previous studies [2, 3] discovered that relative features provide better performance in prediction than absolute ones, we decided that all our 63 features have to be relative. For EQ-Mina we set up the following categories of features for each file containing changes within the inspection period:

16

J. Ratzinger, M. Pinzger, and H. Gall

Size. This category groups ”classical” measures such as lines of code from an evolution perspective: linesAdded, linesModified, or linesDeleted relative to the total LOC of a file. For example a file had three revisions within the learning period adding 3, 5, and 4 lines and this file had 184 lines before the learning  period, we feed into the data mining: (linesAdded = (3+5+4)/184) => ( defects). Other features of this category are linesType, which defines if there are more linesAdded or linesModified. Additionally, we regard largeChanges as double of the LOC of the average change size and smallChanges as half of the average LOC. We expect that this number is an important feature in the data mining, as other studies have found out that small modules are more defect-prone than large ones. [12, 13] Team. The number of authors of files influences the way software is developed. We expect that the more authors are working on the changes the higher is the possibility of rework and mistakes. We define a feature for the authorCount relative to the changeCount. Further, the interrelation in people work is interesting. We investigate work rotation between the authors involved in the changes of each file as the feature authorSwitches. The number of people assigned to an issue and the authors contributing to the implementation of this issue is another feature we use for our prediction models. Process orientation. In this category we assemble features that define how disciplined people follow software development processes. For source code changes developers have to include the issue number in their commit message to the versioning system. We define a feature regarding issueCount relative to changeCount. The developer is requested to also provide some rationale in the commit message. Thus, we use withNoMessage measuring changes without any commit comment as a feature for prediction. In each project the distribution between different priorities of issues should be balanced. Usually, the number of issues with highest priority is very low. A high value may indicate problems in the project that have effects on quality and re-work amount. Accordingly, we investigate highPriorityIssues and middlePriorityIssues relative to the total number of issues. Also the time to close certain classes of issues provides interesting input for prediction and we use avgDaysHighPriorityIssues and avgDaysMiddlePriorityIssues in relation to the average number of days that are necessary to close an issue. To get an estimation for the work habits of the developers we inspect the number of addingChanges, modifyingChanges, and deletingChanges per file. This information provides input to the defect prediction of files. Complexity of existing solution. According to the laws of software evolution [14], software continuously becomes more complex. Changes are more difficult to add as the software is more difficult to understand and the contracts between existing parts have to retain. As a result we investigate the changeCount in relation to the number of changes during the entire history of each file. The changeActivityRate

EQ-Mine: Predicting Short-Term Defects for Software Evolution

17

is defined as the number of changes during the entire lifetime of the file relative to the months of the lifetime. The linesActivityRate describes the number of lines of code relative to the age of the file in months. We approximate the quality of the existing solution by the bugfixCountBefore before our prediction period relative to the general number of changes before the prediction period. We expect that the higher the fix rate is before the inspection period the more difficult it is to get a better quality later on. The bugfixCount is used as well as bugfixLinesAdded, bugfixLinesModified, and bugfixLinesDeleted in relation to the base measures such as the number of lines of code added, modified, and deleted for this file. For bug fixes not much new code should be necessary, as most code is added for new requirements. Therefore, linesAddPerBugfix, linesModifiedPerBugfix, and linesDeletedPerBugfix are interesting indicators, which measure the average lines of code for bug fixes. Difficulty of problem. New classes are added to object-oriented systems when new features and new requirements have to be satisfied. We use the information whether a file was newly introduced during the prediction period as feature for data mining. To measure how often a file was involved during development with the introduction of other new files we use cochangeNewFiles as a second indicator. Co-changed files are identified as described in [15]. The amount of information necessary to describe a requirement is also an important source of information. The feature issueAttachments identifies the number of attachments per issue. Relational Aspects. In object-oriented systems the relationship between classes is an important metrics. We use the co-change coupling between files to estimate their relationship. We use the number of co-changed files relative to the change count as feature cochangedFiles. Additionally, we quantify co-changed couplings with features based on commit transactions similar to the size measures for single files: TLinesAdded, TLinesModified, and TLinesDeleted relative to lines of code added, modified, and deleted. The TLinesType describes if the transactions contained more lines added or lines modified. TChangeType is a coarser grained feature that describes if this file was part of transactions with more adding revisions or more modifying revisions. For file relations we also use bug fix related features: TLinesAddedPerBugfix and TLinesChangedPerBugfix are two representatives. Additionally, we use TBugfixLinesAdded, TBugfixLinesModified, and TBugfixLinesDeleted relative to the linesAdded, linesModified, and linesDeleted. Time constraints. As software processes stress the necessity of certain activities and artifacts, we believe that the time constrains are important for software predictions. The avgDaysBetweenChanges feature is defined as the average number of days between revisions. The number of days per line of code added or changed captured as avgDaysPerLine.

18

J. Ratzinger, M. Pinzger, and H. Gall

Peaks and outliers have been shown to give interesting events in software projects [15]. For the relativePeakMonth feature we measure the location of the peak month, which contains most revisions, within the prediction period. The peakChangeCount feature describes the number of changes happening during the peak month normalized by the overall number of changes. The number of changes is measured based on the months in the prediction period with feature changeActivityRate. For more fine grained data the lines of code added and changed relative to the number of months is regarded for feature linesActivityRate. Testing. We use testing metrics as an input to prediction models, because they allow estimating the remaining bug number. The number of bug fixes initiated by the developers itself provides insight into the quality attentiveness of the team and are covered by feature bugfixesDiscoveredByDeveloper. 4.2

Data Mining

For model generation and evaluation we use the data mining tool called Weka [16]. It provides algorithms for different data mining tasks such as classification, clustering, and association analysis. For our prediction and classification models we selected linear regression, regression trees (M5), and classifier C4.5. The regression algorithms are used to predict the number of defects for a class from its evolution attributes. The following metrics are used to assess the quality of our numeric prediction models: – Correlation Coefficient (C. Coef.) ranges from -1 to 1 and measures the statistical correlation between the predicted values and the actual ones in the test set. A value of 0 indicates no correlation, whereas 1 describes a perfect correlation. Negative correlation indicates inverse correlation, but should not occur for prediction models. – Mean Absolute Error (Abs. Error) is the average of the magnitude of individual absolute errors. This assessment metrics does not have a fixed range like the correlation coefficient, but is geared to the values to be predicted. In our case the number of defects per file is predicted, which ranges from 1 to 6 and 16 respectively (see Table 1 and Table 2). As a result, the closer the mean absolute error is to 0 the better. A value of 1 denotes that on average the predicted value differs from the actual number of defects by 1 (e.g. 3,5 instead of 4). – Mean Squared Error (Sqr. Error) is the average of the squared magnitude of individual errors and it tends to exaggerate the effect of outliers – instances with larger prediction error – more than mean absolute error. The range of the mean squared error is geared to the ranges of predicted values, similar to the mean absolute error. But this time the error metrics is squared, which overemphasize predictions that are far away of the actual number of defects. The quality of the prediction model is good, when the mean squared error is close to the mean absolute error.

EQ-Mine: Predicting Short-Term Defects for Software Evolution

19

The quality of our prediction models is assessed through 10-fold cross validation. For this method the set of instances is splitt randomly into 10 sub-sets (folds) and the model is build 10 times and validated 10 times. For each turn the classification model is trained on nine folds and the remaining one is used for testing. The resulting 10 quality measures are averaged to yield an overall quality estimation. Therefore, 10-fold cross validation is a strong validation technique.

5

Case Study

For our case study with EQ-Mine we analyzed a commercial software system from the health care environment. The software system is composed of 5 applications such as a clinical workstation or a patient administration system. This objectoriented system is built in Java consisting of 8.600 classes with 735.000 lines of code. For the clinical workstation a plug-in framework similar to the one of Eclipse is used and currently 51 plug-ins are implemented. The development is supported by CVS as the versioning system for source files and Jira as the issue tracking system. We analyzed the last two releases of this software system: One in the first half of 2006 and the other one in the middle of 2005. Table 1. Pre-release: Number of files distinguishing between the ones with defects of all severities and files where defects with high severity were found Number of defects Number Number of defects Number per file of files per file of files (all severities) (high severity) 1 46 1 10 2 11 2 2 3 5 3 1 4 7 4 0 5 2 5 0 6 1 6 0

5.1

Experimental Setup

For our experiments we investigated 8 months of software evolution in our case study. We use two months of development time to predict the defects of the following two months, which builds up a 4 months time frame. We compare the predictions before the release date with the predictions after it, which results in a period of 8 months. Before the release we create prediction models for defects in general and for defects with high severity. These models can be compared to the ones after. After the release date we additionally distinguish defects discovered by internal staff vs. defects reported from the field (customer). With this experimental set up we test our hypotheses from Section 2.

20

J. Ratzinger, M. Pinzger, and H. Gall Table 2. Post-release: Number of files distinguishing different types of defects Number of defects No. of files No. of files No. of files No. of files with severity severity=all severity=all severity=all severity=high reported by int. & ext. internal staff external customer int. & ext. 1 46 30 32 21 2 21 12 7 1 3 8 6 1 0 4 6 4 1 0 5 5 4 0 0 7 1 1 0 0 12 1 1 0 0 16 1 1 0 0

5.2

Results

Short Time Frames. Our analysis focuses on short time frames. To evaluate H1 of Section 2 we use two months of development time to predict the following two months. Table 3 shows several models predicting defects before the release where the two months period for defect counting are laid directly before the release date and the other two months before this two target months are taken to collect feature variables for the prediction models. In the first Table 3(a) we can see that we obtain a correlation coefficient larger than 0.5, which is a quite good correlation. The mean absolute error is low with 0.46 for linear regression and 0.36 for M5 and the mean squared error is also low with 0.79 for linear regression and 0.67 for M5. In order to assess these prediction errors, Table 1 describes the defect distribution of the two target months. As mean squared error emphasizes outliers, we can state that the overall error performance of the prediction of all pre-release defects is very good. Table 3. Prediction pre-release defects C. Coef. Abs. Error Sqr. Error Lin. Reg. 0.5031 0.4604 0.7881 0.6137 0.3602 0.6674 M5 (a) All defects

C. Coef. Abs. Error Sqr. Error Lin. Reg. -0.0424 0.1352 0.3173 0.0927 0.0792 0.2589 M5 (b) High severity defects

To confirm our first hypothesis Table 4(a) lists the quality measures for the prediction of post-release defects. There the values are not as good as for prerelease defects, but the correlation coefficients are still close to 0.5. Therefore, we confirm H1: We can predict short time frames of two months based on feature data of two months.

EQ-Mine: Predicting Short-Term Defects for Software Evolution

21

High Severity. Table 3(b) shows the results for the prediction models on prerelease defects with high severity. We get the severity level of each defect from the issue tracking system, where the defect reporter assigns severity levels. The quality measures for high severity defects differ from the prediction of all defects, because the number and distribution of high severity defects have other characteristics (see Table 1). It is interesting that linear regression has only a negative correlation coefficient. But also M5 can only reach a very low correlation coefficient of 0.10. The overall error level is low because of the small defect bandwidth of 0 up to 3. Table 4. Prediction post-release defects C. Coef. Abs. Error Sqr. Error Lin. Reg. 0.5041 0.9443 1.5285 0.4898 0.7743 1.4152 M5 (a) All defects

C. Coef. Abs. Error Sqr. Error Lin. Reg. 0.4464 0.9012 1.5151 0.5285 0.688 1.3194 M5 (b) Defects discovered internally (through test + development)

C. Coef. Abs. Error Sqr. Error Lin. Reg. 0.253 0.3663 0.5699 0.4716 0.2606 0.4574 M5 (c) Defects discovered externally (through customer + partner companies)

C. Coef. Abs. Error Sqr. Error Lin. Reg. 0.1579 0.1973 0.3175 0.087 0.1492 0.3048 M5 (d) High severity defects

For the post-release prediction of high severity defects in Table 4(d) the correlation coefficient of 0.16 is slightly better. The prediction errors are slightly worse, but this is due to the fact that there are more post-release defects with high severity than pre-release. However, we can conclude: Defects with high severity cannot be predicted with such a precision as overall defects. Before vs. After Release. Our hypothesis H3 states that pre-release defects can be better predicted than the post-release ones. When we compare Table 3(a) with Table 4(a) we see that our hypothesis seems to be confirmed. The correlation coefficients of linear regression are very similar, but the prediction errors are higher for pre-release defects. This situation is even more remarkable for M5, as the pre-release correlation coefficient reaches 0.61 whereas the post-release remains at 0.49. For these prediction models also the two error measures are much higher for post-release. While comparing the defect distribution between pre-release in Table 1 with post-release in Table 2, we could believe that the high error rate is due to the fact that we discovered more files with many defects that occur post-release than pre-release. But when we repeat the model creation of post-release defects with a similar distribution to pre-release, we get still a mean absolute error of 0.68 and a mean squared error of 1.06, which is still clearly larger than for pre-release.

22

J. Ratzinger, M. Pinzger, and H. Gall

What about high severity defects? Are they still better predictable before a release than after? When we look at Table 3(b) and Table 4(d) we see a similar picture for this subgroup of defects. Only the correlation coefficient for linear regression is higher for post-release defects than for pre-release, because there are many more high severity defects after the release. This could be because the defects reported from customers are ranked higher than when they are discovered internally, in order to stress the fact that the defects from customers have to be fixed fast. When we repeat the model creation with similar distributions of pre-release and post-release we get similar correlation coefficients but higher prediction errors for post-release. Therefore, we can conclude that: Predictions of post-release defects have higher errors than for models generated for pre-release. Discovered Internally vs. Externally. We show the difference between prediction of defects discovered by internal staff (testers, developers) vs. defects discovered externally (e.g. customer, partner companies) in Table 4(b) and Table 4(c). For internal defects the correlation coefficient is larger than 0.5, which is produced by the M5 predictor. Although it seems that the prediction error is lower for external defects than for internal ones, this result may be caused by the fact that there are no files with many externally discovered defects (see post-release defect distribution in Table 2). However, when we redo the prediction for internal defects with a similar distribution as for external defects, we get a mean absolute error of 0.48 and a mean squared error of 0.86 with a correlation coefficient of 0.47. As a result, we can partly reject H4 and conclude that: Defects discovered externally by customers and partner companies can be predicted with lower absolute and squared error than defects discovered internally by testers and developers. Aspects of Prediction Models. To analyze the aspects of prediction models in more detail we created two cases using the C4.5 tree classifier: The first model distinguishes between files that are defect-prone vs. files without defects. The second tree model separates the files with just one defect from the ones with several defects. At each node in the tree, a value for the given feature is used to divide the entities into two groups: files with a feature value large/smaller than the threshold. The leafs of the decision trees provide a label for the entities (e.g. predicted number of defects). For each file such a tree has to be traversed according to its features to obtain the predicted class. If a node has no or only one successor than it is defined to be a leaf node for a part of the tree. Tree 1 describes that the feature bearing the most information concerning defect-proneness is the location of the peak month, where the peak month is defined as the one containing the most change events for the analyzed file. Features on the second level are change activity rate and author count. Relative

EQ-Mine: Predicting Short-Term Defects for Software Evolution

23

peak month and change activity rate represent the category of time constraints. Nevertheless, the tree is composed of features from many different categories. Author count and author switches belong to the team category. The number of resolved issues in relation to all issues referenced by source code revisions is an indicator for the process category, similar to the number of source adding changes in relation to the overall change count. Also the ratio of revisions without a commit message describes the process orientation of the development. The number of lines added per bug fix provides insight into the development process itself. We conclude that not size and complexity measures dominate defect-proneness, but many people-related issues are important. tree root relativePeakMonth — changeActivityRate — — resolvedIssues — — — bugfixLinesAdded — — — — withNoMessage relativePeakMonth — authorCount — — addingChanges — — — authorSwitches Tree 1. Pre-release with/without defects Tree 2 describes the prediction model evaluating the defect-prone files (one vs. several defects). This classification tree is much smaller than the previous one for prediction of defect-prone files. Nevertheless, it contains data mining features from many categories. The top level and the bottom level both regard lines edited during bug fixing, but on the first level the lines added to the file are of interest whereas at the bottom the relational aspect is central with lines deleted in all files of common commit transactions. Additionally, the team aspect plays an important role, as the number of author switches is the feature on the second level. The model is completed by features indicating the ratio of adding and changing modifications. tree root linesAddPerBugfix — authorSwitches — — addingChanges — — — modifyingChanges — — — — TBugfixLinesDel Tree 2. Pre-release one vs. several defects

24

J. Ratzinger, M. Pinzger, and H. Gall

From these classifications we conclude that: Multiple aspects such as time constraints, process orientation, team related and bug-fix related features play an important role in defect prediction models. 5.3

Limitations

Our mining approach is strongly related with the quality of our data for the case study. As a result, validity of our findings is related with the data of the versioning and issue tracking system. Versioning systems register single events such as commits of developers, where the event recording depends on the work habits of the developers. However, we could show that an averaging effect supports statistical analysis [17] in general. Our data rely strongly on automated processing. On one hand this ensures constancy, but on the other hand it is a source of blurring effects. In our case we extracted issue numbers from commit messages to map the two information systems. To improve the situation we could try to map from bug reports to code changes based on commit dates and issue dates as described in [5]. In our case this approach does not provide any valuable mappings, which we discovered on a random sample of 100 discovered matches. We can only identify locations of defects corrections based on change data from versioning systems and derive from this information prediction models for components. Bug fixes can take place at locations different to the source of defects. Similar approaches are used by other researchers [5, 4, 3]. With predicting defect corrections, we provide insight into improvement efforts, as defect fixes could be places being in urgent need of code stabilization. For our empirical study we selected software applications of different types such as graphical workstations, administrative consoles, archiving and communication systems, etc. We still cannot claim generalization of our approach on other kinds of software systems. Therefore, we need to evaluate the applicability of EQ-Mine on each specific software project. Nevertheless, this research work contributes to the existing empirical body of knowledge.

6

Conclusions and Future Work

In this work we have investigated several aspects of defect prediction based on a large industrial case study. Our research contributes to the body of knowledge in the field of software quality estimation in several ways. We conducted one of the first studies dealing with fine grained predictions of defects. We estimate the defect proneness based on a short time-frame. With this approach project managers can decide on the best time-frame for release and take preventive actions to improve user satisfaction. Additionally, we compare defect prediction before and after releases of our case study and discovered that in both cases an accurate prediction model can be established. In contrast to other studies,

EQ-Mine: Predicting Short-Term Defects for Software Evolution

25

we investigated the predictability of defects of different severity. We could show that prediction of defects with high severity has lower precision. We also analyzed customer perceived quality, where defects reported by customers need other prediction models than defects discovered by internal staff such as testing. In order to create accurate prediction models we inspected different aspects of software projects. Although size was already used in many other studies it is still an important input for prediction. We extend size measures with relational aspects, where we use the data about evolutionary co-change coupling of software entities. We can show that, for example, the number of lines added to all classes on common changes is as important for defect prediction of a class as the number of lines added to this particular class. Other aspects of our approach are the complexity of the existing solution and the difficulty of the problem in general, as they are causes of software defects. We include people issues of different types in our analysis to cover another important cause of defects. When a developer has to work on software that somebody else has initially written mistakes can occur, because she has to understand the design of her colleague. Factors such as author switches are covered by our team group of data mining features. The discipline of a developer does also influence defect probability. As a result we use indicators for process related issues. Finally, we include time constrains and testing related features into our defect prediction models. The models were created based on 63 data mining features from the 8 categories described. In our future work we focus on the following topics: – Software Structure. As we currently use evolution measures for quality estimations, we intend to enrich our models with information about software structures. Object-oriented inheritance hierarchies as well as data and control flow information provide many insights into software systems, which we will include in our quality considerations. – Automation. Our analysis relies on automated data processing such as information retrieval, mapping of defect and version information, and feature computation. The model creation relies on scripts using the Weka data mining tool [16]. Integrated tools providing predictions and model details such as the most important features can help different stakeholders. On the one hand, developers could profit from this information best, when it is available in the development environment. On the other hand, project managers need a lightweight tool separated from development environments to base their decisions on.

Acknowledgments This work is partly funded by the Austrian Fonds zur Frderung der Wissenschaftlichen Forschung (FWF) as part of project P19867-N13. We thank Peter Vorburger for his valuable input and thoughts to this research work. Special thanks go to Andr´e Neubauer and others for their comments on earlier versions of this paper.

26

J. Ratzinger, M. Pinzger, and H. Gall

References 1. Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Transactions on Software Engineering 25(5) (1999) 675–689 2. Knab, P., Pinzger, M., Bernstein, A.: Predicting defect densities in source code files with decision tree learners. In: Proceedings of the International Workshop on Mining Software Repositories, Shanghai, China, ACM Press (2006) 119–125 3. Nagappan, N., Ball, T.: Use of relative code churn measures to predict system defect density. In: Proceedings of the International Conference on Software Engineering, St. Louis, MO, USA (2005) 284–292 4. Ostrand, T.J., Weyuker, E.J.: The distribution of faults in a large industrial software system. In: Proceedings of the International Symposium on Software Testing and Analysis, Rome, Italy (2002) 55–64 5. Schr¨ oter, A., Zimmermann, T., Zeller, A.: Predicting component failures at design time. In: Proceedings of the International Symposium on Empirical Software Engineering, Rio de Janeiro, Brazil (2006) 18–27 6. Wagner, S., J¨ urjens, J., Koller, C., Trischberger, P.: Comparing bug finding tools with reviews and tests. In: Proceedings of the International Conference on Testing of Communicating Systems, Montreal, Canada (2005) 40–55 7. Khoshgoftaar, T.M., Yuan, X., Allen, E.B., Jones, W.D., Hudepohl, J.P.: Uncertain classification of fault-prone software modules. Empirical Software Engineering 7(4) (2002) 297–318 8. Briand, L.C., Basili, V.R., Thomas, W.M.: A pattern recognition approach for software engineering data analysis. IEEE Transactions on Software Engineering 18(11) (1992) 931–942 9. Nikora, A.P., Munson, J.C.: Developing fault predictors for evolving software systems. In: Proceedings of the Software Metrics Symposium, Sydney, Australia (2003) 338–350 10. Shirabad, J.S., Lethbridge, T.C., Matwin, S.: Mining the maintenance history of a legacy software system. In: Proceedings of the International Conference on Software Maintenance, Amsterdam, The Netherlands (2003) 95–104 11. Fischer, M., Pinzger, M., Gall, H.: Populating a release history database from version control and bug tracking systems. In: Proceedings of the International Conference on Software Maintenance, Amsterdam, Netherlands, IEEE Computer Society Press (2003) 23–32 12. Moeller, K., Paulish, D.: An empirical investigation of software fault distribution. In: Proceedings of the International Software Metrics Symposium. (1993) 82–90 13. Hatton, L.: Re-examining the fault density-component size connection. IEEE Software 14(2) (1997) 89–98 14. Lehman, M.M., Belady, L.A.: Program Evolution - Process of Software Change. Academic Press, London and New York (1985) 15. Gall, H., Jazayeri, M., Ratzinger (former Krajewski), J.: CVS release history data for detecting logical couplings. In: Proceedings of the International Workshop on Principles of Software Evolution, Lisbon, Portugal, IEEE Computer Society Press (2003) 13–23 16. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. 2 edn. Morgan Kaufmann, San Francisco, USA (2005) 17. Ratzinger, J., Fischer, M., Gall, H.: Evolens: Lens-view visualizations of evolution data. In: Proceedings of the International Workshop on Principles of Software Evolution, Lisbon, Portugal (2005) 103–112

An Approach to Software Evolution Based on Semantic Change Romain Robbes, Michele Lanza, and Mircea Lungu Faculty of Informatics University of Lugano, Switzerland

Abstract. The analysis of the evolution of software systems is a useful source of information for a variety of activities, such as reverse engineering, maintenance, and predicting the future evolution of these systems. Current software evolution research is mainly based on the information contained in versioning systems such as CVS and SubVersion. But the evolutionary information contained therein is incomplete and of low quality, hence limiting the scope of evolution research. It is incomplete because the historical information is only recorded at the explicit request of the developers (a commit in the classical checkin/checkout model). It is of low quality because the file-based nature of versioning systems leads to a view of software as being a set of files. In this paper we present a novel approach to software evolution analysis which is based on the recording of all semantic changes performed on a system, such as refactorings. We describe our approach in detail, and demonstrate how it can be used to perform fine-grained software evolution analysis.

1 Introduction The goal of software evolution research is to use the history of a software system to analyse its present state and to predict its future development [1] [2]. It can also be used to complement existing reverse engineering approaches to understand the current state of a system [3] [4] [5] [6]. The key to perform software evolution research is the quality and quantity of available historical information. Traditionally researchers extract historical data from versioning systems (such as CVS and SubVersion), which at explicit requests by the developers record a snapshot of the files that have changed (this is widely known as the checkin/checkout model). We argue that the information stored in current versioning systems is not accurate enough to perform higher quality evolution research, because they are not explicitely designed for this task: Most versioning systems have been developed in the context of software configuration management (SCM), whose goal is to manage the evolution of large and complex software systems [7]. But SCM serves different needs than software evolution, it acts as a management support discipline concerned with controlling changes to software products and as a development support discipline assisting developers in performing changes to software products [8] [9]. Software evolution on the other hand is concerned with the phenomenon of the evolution of software itself. The dichotomy between SCM and software evolution has led SCM researchers to consider software evolution research as a mere “side effect” of their discipline [10]. M.B. Dwyer and A. Lopes (Eds.): FASE 2007, LNCS 4422, pp. 27–41, 2007. c Springer-Verlag Berlin Heidelberg 2007 

28

R. Robbes, M. Lanza, and M. Lungu

Because most versioning systems originated from SCM research, the focus has never been on the quantity and quality of the recorded evolutionary information, which we consider as being (1) insufficient and (2) of low quality. It is insufficient because information only gets recorded when developers commit their changes. In previous work [11] we have analyzed how often developers of large open-source projects commit their changes and found that the number of commits per day barely surpasses 1 (one commit on average every 8 “working day” hours). The information is of low quality because there is a loss of semantic information about the changes: only textual changes get recorded. For example, to detect structural changes such as refactorings one is forced to tediously reconstruct them from incomplete information with only moderate success [12] [13]. Overall this has a negative impact on software evolution research whose limits are set by the quality and quantity of the available information. This paper presents our approach to facilitate software evolution research by the accurate recording of all semantic changes that are being performed on a software system. To gather this change information, we use the most reliable source available, namely the Integrated Development Environment (IDE, such as Eclipse 1 or Squeak 2 ) used to develop object-oriented software systems. Modern development environments allow programmers to perform semantic actions on the source code with ease, thanks to semi-automatic refactoring [14] support. They also have an open architecture that tools can take advantage of: The event notification system the IDE uses can be monitored to keep track of how the developers modify the source code. From this information, we build a model of the evolution of a system in which the notion of change takes on a primary role, since people develop a software system by incrementally changing it [15]. The notion of incremental change is further supported by IDEs featuring incremental compilation where only the newly modified parts get compiled, i.e., an explicit system building phase where the whole system is being built from scratch is losing importance. In our model, the evolution of a system is the sequence of changes which were applied to develop it. These changes are operations on the program’s abstract syntax tree at the simplest level. Through a composition mechanism, changes are grouped to represent larger changes associated with a semantic meaning, such as method additions, refactorings, feature additions or bug fixes. Thus we can reason about a system’s evolution on several levels, from a high-level view suitable to a manager down to a concrete view suitable to a developer wishing to perform a specific task. We store the change information in a repository, to be exploited by tools integrated in the IDE the programmer is using. After presenting our approach, we show preliminary results, based on the change matrix, an interactive visualization of the changes applied to the system under study. Structure of the paper. Section 2 presents the principles and a detailed overview of our approach. Section 3 presents a case study we performed to validate our approach, in which we used the change matrix visualization to assess the evolution of projects done by students. Section 4 and 5 compare our approach to more traditional approaches to 1 2

http://www.eclipse.org http://www.squeak.org

An Approach to Software Evolution Based on Semantic Change

29

software evolution analysis. Section 6 briefly covers the implementation. In Section 7 we conclude and outline future work.

2 Change-Based Object-Oriented Software Evolution Our approach to software evolution analysis is based on the following principles: – Programming is more than just text editing, it is an incremental activity with semantics. If cutting out a piece of a method body and wrapping it into its own method body can be seen as cut&paste, it is in fact an extract method refactoring. Hence, instead of representing a system’s evolution as a sequence of versions of text files, we want to represent it as a sequence of explicit changes with object-oriented semantics. – Software is in permanent evolution. Modern Integrated Development Environments (IDE), such as Eclipse, are a very rich and accurate source of information about a system’s life-cycle. IDEs thus can be used to build a change-based model of evolving object-oriented software and to gather the change data, which we afterwards process and analyze. Based on the analyzed data, we can also create tools which feed back the analyzed data into the IDE to support the development process. Taming Change. Traditional approaches to evolution analysis consider the history of a system as being a sequence of program versions, and compute metrics or visualize these versions to exploit the data contained in them [16][17]. Representing evolution as a sequence of version fits the format of the data obtained from a source code repository. There is a legitimate doubt that the nature of existing evolution approaches is a direct consequence of the representation adopted by versioning systems, and is therefore limited by this. The phenomenom of software evolution is one of continuous change. It is not a succession of program versions. Our approach fits this view because it models the evolution of a software system as a sequence of changes which have inherent object-oriented semantics, focusing on the phenomenon of change itself, rather than focusing on the way to store the information. We define semantic changes as changes at the design level, not at the behavioral level as in [18]. Modeling software evolution as meaningful change operations fits the inherently incremental nature of software development, because this is the very way with which developers are building systems. Programmers modify software by adding tiny bits of functionality at a time, and by testing often to get feedback. At a higher level features and bug fixes are added incrementally to the code base and at an even higher level the program incrementally evolves from one milestone version to another. A consequence of this approach is that by recording only the changes we do not explicitly store versions, but we can reconstruct any version by applying the changes. In SCM this concept is called change-based versioning [9], however the fundamental difference between our approach and the existing ones is that the changes in our case feature fine-grained object-oriented semantics and are also first-level executable entities.

30

R. Robbes, M. Lanza, and M. Lungu

Our goal is to build a model of evolution based on a scalable representation of change. First we discuss how we represent programs, then we examine how we model changes and how we extract them from IDEs. Representing Programs. Our model defines the history of a program as the sequence of changes the program went through. From these changes we can reconstruct each successive state of the program source code. We represent one state of the entire program as one abstract syntax tree (AST). Below the root are the packages or modules of the program. Each package in turn has children which are the package’s classes. Class nodes have children for their attributes and their methods. Methods also have children. The children of a method form a subtree which is obtained by parsing the source code contained in the method. Thus each entity of a program, from the package level down to the program statement level, is represented as a node in the program’s AST, as show in Figure 1.

System

Package A

...

private int x

Package B

Class E

Package C

Class F

public void foo(int y)

...

...

return + y

x

Fig. 1. We represent a state of a program as an abstract syntax tree (AST)

Each node contains additional information that is stored in properties associated with the node. The set of properties (and their values) defined on a node can be seen as its label or meta-information. Properties depend on the type of nodes they are associated to. For instance, class nodes have properties like name, superclass, and comment. An instance variable has a name property. In a statically typed programming language it would also have a type property, as well as a visibility modifier in the case of Java. Methods have a name and could have properties encoding its type signature in a typed language. The property system is open so that other properties can be added at will.

An Approach to Software Evolution Based on Semantic Change

31

Extracting the Changes. The type of semantic change information that we model is not retrievable from existing versioning systems [19]. Such detailed information about a software system can be retrieved from the IDEs developers are using to build software systems. IDEs are a good source of information because: – They feature a complete model of a program to provide advanced functionalities such as code completion, code highlighting, navigation facilities and refactoring tools. Such a model goes beyond the representation at the file level to reference program-level entities such as methods and classes in an object-oriented system. – They feature an event notification system allowing third-party tools to be notified when the user issues a change to the program. Mechanisms such as incremental compilation and smart completion of entity names take advantage of this. – IDEs allow a user to automatically perform high-level transformations of programs associated with a semantic meaning, namely refactorings. These operations are easy to monitor in an IDE, but much harder to detect outside of it, since they are lost when the changed files are committed to a software repository [12] [13]. Since some IDEs are extensible by third parties with plugin mechanisms, our tools can use the full program model offered by the IDEs to locate and reason about every statement in the program, and can be notified of changes without relying on explicit action by the developers. This mechanisms alleviate the problems exhibited by the use of versioning systems: It is easier to track changes applied to an entity in isolation rather than attempting to follow it through several versions of the code base, each comprising a myriad of changes. Furthermore, after each notification, the IDE can also be queried for time and author information. Representing the Changes. We model changes as first-class executable entities. It is possible to take a sequence of changes and execute it to build the version of the program represented by them. Changes can also be reversed (or undone) to achieve the effect of going back in time. Changes feature precise time and authorship information, allowing the order of the changes to be maintained. In contrast, most other approaches reduce the time information to the time where the change was checked in, following the checkin/checkout model supported by the versioning systems, such as CVS and SubVersion. There are two distinct kinds of changes, (1) low-level changes operating at the syntactic level, and (2) higher-level changes with a semantic meaning, which are composed of lower-level changes. 1. Syntactic Changes. They are simple operations on the program AST, defined as follows: – A creation creates a node n of type t, without inserting it into the AST. – An addition adds a node n to the tree, as a child of another specified node m. If order is important, an index can be provided to insert the node n in a particular position in the children of m. Otherwise, n is appended as the last child of m. – A deletion removes the specified node n from its parent m. – A property change sets the property p of node n to a specific value v.

32

R. Robbes, M. Lanza, and M. Lungu

Using these low-level changes, we view a program as an evolving abstract syntax tree. A program starts as an empty tree and an empty change history. As time elapses, the program is built and the AST is populated. At the same time, all the change operations which were performed to build the program up to this point are stored in the change history. 2. Semantic Changes. To reason about a system, we need to raise the level of abstraction beyond mere syntactic changes. This is achieved by the composition mechanism. A sequence of lower-level changes can be composed to form a single, higher level change encapsulating a semantic meaning. Here are a few examples: – A sequence of consecutive changes involving a single method m can be interpreted as a single method implementation, or modification if m already existed. – Changes to the structure of a class c (attributes, superclass, name) are either a class definition or a class redefinition, if c existed before the changes. These kinds of changes form the intermediate changes. – At a higher level, some sequences of intermediate changes are refactorings[14]. They can be composed further to represent these higher-level changes to the program. For example, the “extract method” refactoring involves the modification of a first method m1 (a sequence of statements in m1 is replaced by a single call to method m2), and the implementation of m2 (its body comprises the statements that were removed from m1). In the same way, a “rename class” refactoring comprises the redefinition of the class (with a name change), and the modification of all methods because of the changed referenced class name. – We define a bug fix as the sequence of intermediate changes which were involved in the correction of the faulty behavior. – In the same way, a feature implementation is comprised of all the changes that programmers performed to develop the feature. These changes can be intermediate changes as well as any refactorings and bug fixes which were necessary to achieve the goal. – At an even higher level, we can picture main program features as being an aggregation of smaller features, and program milestones (major versions) as a set of high-level features and important bug fixes. The composition of changes works at all levels, to allow changes to represent higherlevel concepts. This property is a key point to the scalability of our approach. Without it, we would have to consider only low-level, syntactic changes, and hence be limited to trivial programs, because of the sheer quantity of changes to consider. In addition to composition, it is also possible to analyse the evolution of a system by considering subsets of changes. Thus a high-level analysis of a system would only take into account the changes applied to classes and packages, in order to have a bird’s eye view of the system’s evolution. The lower-level changes are still useful to analyse the evolution: Once an anomaly has been identified in a high-level strata of the system, lower-level changes can be looked at to infer the particular causes of a problem. For example, if a package or a module of the system needs reengineering, then its history in terms of classes and methods can be summoned. Once the main culprits of the problem have been identified, these few classes can be viewed in even more detail by looking at the changes in the implementation of their methods.

An Approach to Software Evolution Based on Semantic Change

33

To sum up, we consider the program under analysis as an evolving abstract syntax tree. We store in our model all the change operations necessary to recreate the program at any point in time. At the lowest level, these operations consist of creation, addition and removal of nodes in the tree, and of modifications of node properties. These changes can be composed to represent higher-level changes corresponding to actions at the semantic level, such as refactorings, bug fixes etc.

3 Case Studies Since our approach relies on information which was previously discarded, we can not use existing systems as case studies. We monitored new projects to collect all the information. Our case studies are projects done by students over the course of a week. These projects are small (15 to 40 classes), but are interesting case studies since the code base is foreign to us. There were 3 possible subjects to choose from: A virtual store in the vein of Amazon (Store), a simple geometry program (Geom), and a text-based role-playing game (RPG). Table 1 shows a numerical overview of the projects we have tracked (each project is named with a letter, from A to I). The frequency of the recorded changes was very high compared to a that of a classical versioning system: While the projects lasted one week, their actual coding time was in the range of hours. Considering this fact and that the students were novice programmers, our approach allows for an unprecedented precision with respect to the recording of the evolution. Table 1. A numerical overview of the semantic changes we recovered from the projects Project Type Class Added Class Modified Class Commented Class Recategorized Class Renamed Class Removed Attributes Added Attributes Removed Method Added Method Modified Method Removed

A Geom 22 65 0 0 0 10 82 50 366 234 190

B Store 14 17 12 0 0 1 19 7 119 69 20

C Store 14 34 0 5 0 5 29 13 182 117 81

D Store 9 13 0 0 0 5 19 5 164 140 32

E Store 12 6 1 0 1 0 20 2 117 81 13

F RPG 15 24 0 0 1 3 61 19 237 154 38

G Geom 21 57 0 0 0 6 30 15 219 143 117

H Store 12 15 0 0 0 2 29 5 135 118 21

I RPG 41 27 0 11 1 18 137 54 415 185 106

The changes considered here are intermediate-level changes, one per semantic action the user did (in that case, mainly class and method modifications: the students were familiar with refactoring). The table classifies the changes applied to each project. We can already see some interesting trends: Some projects have a lot more “backtracking” (removals of entities) than others; usage of actions related to refactoring (commenting, renaming, repackaging entities) varies widely between projects. In the remainder of the section, we concentrate on the analysis of one of the projects, namely the role-playing game project I (the last column of the table). More details on the other projects are available in the extended version of [11].

34

R. Robbes, M. Lanza, and M. Lungu

3.1 Detailing the Evolution of a Student Project We chose project I for a detailed study, because it had the most classes in it, and was the second largest in statements. Project I is a role-playing game in which a player has to choose a character, explore a dungeon and fight the creatures he finds in it. In this process, he can find items to improve his capabilities, as well as gaining experience. We base our analysis on the change matrix Figure 2 inspired by [17]. It is a timeline view of the changes applied to the entire system, described in terms of classes and methods (a coarser-grained version, displaying packages and classes is also available, but not shown in this paper). The goal of the change evolution matrix is to provide the user with an overview of the activity in the project at the method level granularity over time. Time is mapped on the x-axis. Every method is allocated a horizontal band which is gray for the time period in which the method existed and white otherwise. The method bands are grouped by classes, and ordered by their creation time. Classes are delimited by black lines and are also ordered by their creation time, with the oldest classes at the top of the figure. Changes are designed by colors: green for the creation of a method, blue for its removal and orange for a modification. Selecting a change shows the method’s source code after the change is applied to the system. A restriction of the figure at this time of writing is that it does not show when a class is deleted. Figure 2 is rotated for increased readability. Events are mapped on intervals lasting 35 minutes. Note that to ease comprehension the system size is reported on the left of the page, and sessions are delimited by rectangles with rounded corners in both the matrix and the graph size view. Also, the class names are indicated below the figure. Figure 3 represent the same matrix, but focused on the class Combat. Since its lifespan is shorter, we can increase the resolution to five minutes per interval. Considering the classes and their order of creation (Figure 2), we can see that the first parcels of functionality were, in order: The characters; the weapons; the enemies; the combat algorithm; the healing items and finally the dungeon itself, defined in terms of rooms. We can qualify this as a bottom-up development methodology. After seeing these high-level facts about the quality-wise and methodology-wise evolution of the system, we can examine it session by session. Each session has been identified visually and numbered. Refer to Figure 2 to see the sessions. Session 1, March 27, afternoon: The project starts by laying out the foundations of the main class of the game, Hero. As we see on the change matrix, it evolves continually throughout the life of the project, reflecting its central role. At the same time, a very simple test is created, and the class Spell is defined. Session 2, March 28, evening: This session sees the definition of the core of the character functionality: Classes Hero and Spell are changed, and classes Items, Mage, Race and Warrior are introduced, in this order. Since Spells are defined, the students define the Mage class, and after that the Warrior class as another subclass of Hero. This gives the player a choice of profession. The definitions are still very shallow at this stage, and the design is unstable: Items and Race will never be changed again after this session. Session 3, March 28, night: This session supports the idea that the design is unstable, as it can be resumed as a failed experiment: A hierarchy of races has been

35

Session 10 Session 6 Session 2 Session 3 Session 1

28/03

29/03

Session 4

Session 5

30/03

31/03

Session 7

Session 8

Session 9

01/04

02/04

Session 11

03/04

Session 12

An Approach to Software Evolution Based on Semantic Change

Fig. 2. Change matrix of project I

introduced, and several classes have been cloned and modified (Mage2, Hero3 etc.). Most of these classes were quickly removed.

36

R. Robbes, M. Lanza, and M. Lungu

Session 4, March 29, afternoon: This session is also experimental in nature. Several classes are modified or introduced, but were never touched again: Hero3, CEC, RPGCharacter (except two modifications later on, outside real coding sessions). Mage and Warrior are changed too, indicating that some of the knowledge gained in that experiment starts to go back to the main branch. Session 5, March 29, evening and night: This session achieves the knowledge transfer started in session 4. Hero is heavily modified in a short period of time, while Mage and Warrior are consolidated. Session 6, March 30, late afternoon: This session sees a resurgence of interest for the offensive capabilities of the characters. A real Spell hierarchy is defined (Lightning, Fire, Ice), while the Weapons class is slightly modified as well. Session 7, March 31, noon: The first full prototype of the game. The main class, RPG (standing for Role Playing Game) is defined, as well as an utility class called Menu. Mage, Warrior and their superclass Hero are modified. Session 8, March 31, evening: This session consolidates the previous one, by adding some tests and reworking the classes changed in session 7. Session 9, March 31, night: This session focuses on weapon diversification with classes Melee and Ranged; these classes have a very close evolution for the rest of their life, suggesting some data classes. At the same time, a real hierarchy of hostile creatures appears: Enemies, Lacche, and Soldier. The system is a bit unstable at that time, since Enemies experiences a lot of method which were added then removed immediately, suggesting renames.

10 11 12 Algorithms

Fig. 3. Change matrix zoomed on the class Combat

Session 10, April 1st, noon to night: This intensive session sees the first iteration of the combat engine. The weapons, spells and characters are first refined. Then a new enemy, Master, is defined. The implementation of the Combat class shows a lot of modifications of the Weapon and Hero classes. An Attack class soon appears. Judging from its (non-)evolution, it seems to be a data class with no logic. After theses definitions, the implementation of the real algorithm begins. We see on Figure 3 –the detailed view of combat– that one method is heavily modified continuing in the next session. Session 11, April 2, noon to night: Development is still heavily focused on the Combat algorithm. Classes of Potion and Healing are also defined, allowing the heroes to play the game for a longer time. This session also modifies the main combat algorithm, and at the same time, two methods in the Hero class, showing a slight degree of coupling. A second method featuring a lot of logic is implemented, as shown in Figure 3: several methods are often modified.

An Approach to Software Evolution Based on Semantic Change

37

Session 12: April 3, afternoon to night: This last session finishes the implementation of Combat –changing the enemy hierarchy in the process–, and resumes the work on the entry point of the game, the RPG class. Only now is a Room class introduced, providing locality. These classes are tied to Combat to conclude the main game logic. To finish, several types of potions –simple data classes– are defined, and a final monster, a Dragon, is added at the very last minute.

4 Discussion Compared to traditional approaches, extracting information from source control repositories, our change-based approach has a number of advantages (accurate information, scalable representation, and version generation), but also some limitations (portability, availability of case studies, and performance). – Accurate information. The information we gather is more accurate in several ways. It consists of program-level entities, not mere text files which incurs extra treatment to raise the level of abstraction. Since we are notified of changes in an automatic, rather than explicit way, we can extract finer change information: Each change can be processed in context. The time information we gather is accurate up to the second, whereas a versioning system reduces it to the checkin time. Processing changes in context and in a timely manner allows us to track entities through their life time while being less affected by system-wide changes such as refactorings. – Scalable representation. We represent every statement of a system as separate entities, and every operation on those statements as a first-class change operation. Such a precise representation enables us to reflect on very focused changes, during defined time period and on a distinct set of low-level entities. At the other end of the spectrum, changes can be composed into semantic level changes such as method modifications, class additions, or even entire sessions, while the entities we reflect on can be no longer statements, but methods, classes or packages. Thus our approach can both give a “big picture” view to a manager, as well as a detailled summary of the changes submitted by a developer during his or her last coding session. – Version generation. Since changes are executable, we can also reproduce versions of the program. We can thus revert to version analysis and more traditional approaches when we need to. – Portability. Our approach is currently both language-specific and environmentspecific. This allows us to leverage to the maximum the properties of the target language and the possibilities offered by the IDE (in our case, Smalltalk and Squeak). However, it implies a substantial porting effort to use our approach in another context. Consequently, one of our goal is to extract the language and environmentindependent concepts to ease this effort. Thus we will port our prototype to the Java/Eclipse platform. The differences in behavior between the two versions will help us isolate the common concepts.

38

R. Robbes, M. Lanza, and M. Lungu

– Availability of case studies. As mentioned above, we can not use pre-existing projects as case studies since we require information which was discarded previously. Solving this problem is one of our priorities. Beyond using student projects as case studies, we are monitoring our prototype itself for later study. This would be a medium-sized case study: At the time of writing, it comprised 203 classes and 2249 methods over 11681 intermediate changes. We also plan to release and promote our tools to the Smalltalk community (the language our tools are implemented in) soon. In the longer term, porting our tools to the Eclipse platform will enable us to reach a much wider audience of developers. – Performance. Our approach stores operations rather than states of programs. The large number of changes and entities could raise performance concerns. It takes around one minute to generate all the possible versions of our prototype itself from the stored changes. The machine used was a 1.5 GHz portable computer, our prototype having around 11’000 intermediate changes.

5 Related Work Several researchers have analysed the evolution of large software systems, basing their work on system versions typically extracted from software repositories such as CVS and SubVersion [20] [21] [22] [23] [24]. In most cases these approaches have specific analysis goals, such as detecting logical couplings [25] or extracting evolutionary patterns [4]. Several researchers raised the abstraction level beyond files to consider design evolution. In [22], Xing and Stroulia focus on detecting evolutionary phases of classes, such as rapidly developing, intense evolution, slowly developing, steady-state, and restructuring. They had to sample their data for their case study and used only the 31 minor versions of the project. Parsing and analysing the 31 versions took around 370 minutes on a standard computer, which rules out an immediate use by a developer. [26] presents a methodology to connect high-level models to source code, but has only been applied to a single version of a system so far. [23] describes how hierarchies of classes evolve, but still depends on sampling and the checkin/checkout model. [20] applies origin analysis to determine if files moved between versions. In [18], Jackson and Ladd present an approach to differencing C programs at the semantic level. They define semantic changes as dependency changes between inputs and outputs, while we are primarily interested in design-level changes. All these and other known approaches cannot perform a fine-grained analysis because the underlying data is restricted by the data that can be extracted from versioning system, tying them to the checkin/checkout model. In [11], we outlined the limitations of this model to retrieve accurate evolutionary information. Versioning systems restrain their interactions with developers to explicit retrieval of the source (check out), and submission of the modified sources once the developer finishes his task (check in or commit). All the changes to the code base are merged together at commit time, and become hard to distinguish from each other. The time stamp of each modification is lost, and changes such as refactorings become very hard, if not impossible to detect. Even keeping track of the identity of a program element can be troublesome if it has been renamed.

An Approach to Software Evolution Based on Semantic Change

39

Moreover, most versioning systems version text files. This guarantees languageindependence but limits the quality of the information stored to the lowest common denominator: An analysis of the system’s evolution going deeper than the file level requires the parsing of (selected) versions of the system and the linking of the successive versions. Such a procedure is costly [19]. Thus it is a common practice to first sample the data, by only retaining a fraction of the available versions. The differences between two versions retained for analysis becomes even larger, so the quality of the data degrades further. Mens [27] presents a thorough survey of merging algorithms in versioning systems, of which [28] is the closest to our approach: operations performed on the data are used as the basis of the merging algorithm, not the data itself. However, the operations are not precised in the paper and are used only in the merge process. The change mechanism used by Smalltalk systems uses the same idea, but the changes are not abstracted. Smalltalkers usually don’t rely on them and use more classic, state-based versioning systems. In addition, most of the versioning systems covered by Mens are not used widely in practice: most evolution analysis tools are based on the two most used versioning systems, CVS and SubVersion.

6 Tool Implementation Our ideas are implemented for the Smalltalk language and the Squeak IDE in SpyWare, shown in Figure 4. From top to bottom, we see: the main window; a code browser on a version of project I; the change matrix of project I; and a graph showing the growth rate of the system.

Fig. 4. Screen capture of SpyWare, our prototype

7 Conclusion and Future Work We presented a fine grained, change-based approach to software evolution analysis and applied it to nine student projects, one of which was analyzed in detail. Our approach

40

R. Robbes, M. Lanza, and M. Lungu

considers a system to be the sequence of changes that built it, and extract this information from the IDE used during development. We implemented this scheme and performed an evolution analysis case study based on a software visualization tool –the change matrix– we built on top of this platform. Although our results are still in their infancy, they are encouraging as they allow us to focus on particular entities in a precise period of time once a general knowledge of the system has been gained. In our larger vision, we want a more thorough interaction of forward and reverse engineering to support rapidly changing systems. In this scenario, developers need this detailed analysis of part of the system as much as they need a global view of the systems’ evolution. We have only scratched the surface of the information available in these systems. We plan to use more advanced tools, visualizations, and methods (such as complexity metrics) to meaningfully display and interact with this new type of information, and envision other uses beyond evolution analysis.

References 1. Lehman, M., Belady, L.: Program Evolution: Processes of Software Change. London Academic Press, London (1985) 2. Gall, H., Jazayeri, M., Kl¨osch, R., Trausmuth, G.: Software evolution observations based on product release history. In: Proceedings International Conference on Software Maintenance (ICSM’97), Los Alamitos CA, IEEE Computer Society Press (1997) 160–166 3. Mens, T., Demeyer, S.: Future trends in software evolution metrics. In: Proceedings IWPSE2001 (4th International Workshop on Principles of Software Evolution). (2001) 83–86 4. Van Rysselberghe, F., Demeyer, S.: Studying software evolution information by visualizing the change history. In: Proceedings 20th IEEE International Conference on Software Maintenance (ICSM ’04), Los Alamitos CA, IEEE Computer Society Press (2004) 328–337 5. Gˆırba, T., Ducasse, S., Lanza, M.: Yesterday’s Weather: Guiding early reverse engineering efforts by summarizing the evolution of changes. In: Proceedings 20th IEEE International Conference on Software Maintenance (ICSM 2004), Los Alamitos CA, IEEE Computer Society Press (2004) 40–49 6. D’Ambros, M., Lanza, M.: Software bugs and evolution: A visual approach to uncover their relationship. In: Proceedings of CSMR 2006 (10th IEEE European Conference on Software Maintenance and Reengineering), IEEE Computer Society Press (2006) 227 – 236 7. Tichy, W.: Tools for software configuration management. In: Proceedings of the International Workshop on Software Version and Configuration Control. (1988) 1–20 8. Feiler, P.H.: Configuration management models in commercial environments. Technical report cmu/sei-91-tr-7, Carnegie-Mellon University (1991) 9. Conradi, R., Westfechtel, B.: Version models for software configuration management. ACM Computing Surveys 30(2) (1998) 232–282 10. Estublier, J., Leblang, D., van der Hoek, A., Conradi, R., Clemm, G., Tichy, W., WiborgWeber, D.: Impact of software engineering research on the practice of software configuration management. ACM Transactions on Software Engineering and Methodology 14(4) (2005) 383–430 11. Robbes, R., Lanza, M.: A change-based approach to software evolution. In: ENTCS volume 166. (2007) to appear 12. G¨org, C., Weissgerber, P.: Detecting and visualizing refactorings from software archives. In: Proceedings of IWPC (13th International Workshop on Program Comprehension, IEEE CS Press (2005) 205–214

An Approach to Software Evolution Based on Semantic Change

41

13. Filip Van Rysselberghe, M.R., Demeyer, S.: Detecting move operations in versioning information. In: Proceedings of the 10th Conference on Software Maintenance and Reengineering (CSMR’06), IEEE Computer Society (2006) 271–278 14. Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D.: Refactoring: Improving the Design of Existing Code. Addison Wesley (1999) 15. Beck, K.: Extreme Programming Explained: Embrace Change. Addison Wesley (2000) 16. Gˆırba, T., Lanza, M., Ducasse, S.: Characterizing the evolution of class hierarchies. In: Proceedings IEEE European Conference on Software Maintenance and Reengineering (CSMR 2005), Los Alamitos CA, IEEE Computer Society (2005) 2–11 17. Lanza, M.: The evolution matrix: Recovering software evolution using software visualization techniques. In: Proceedings of IWPSE 2001 (International Workshop on Principles of Software Evolution). (2001) 37–42 18. Jackson, D., Ladd, D.A.: Semantic diff: A tool for summarizing the effects of modifications. In M¨uller, H.A., Georges, M., eds.: ICSM, IEEE Computer Society (1994) 243–252 19. Robbes, R., Lanza, M.: Versioning systems for evolution research. In: Proceedings of IWPSE 2005 (8th International Workshop on Principles of Software Evolution), IEEE Computer Society (2005) 155–164 20. Tu, Q., Godfrey, M.W.: An integrated approach for studying architectural evolution. In: 10th International Workshop on Program Comprehension (IWPC’02), IEEE Computer Society Press (2002) 127–136 21. Jazayeri, M., Gall, H., Riva, C.: Visualizing Software Release Histories: The Use of Color and Third Dimension. In: Proceedings of ICSM ’99 (International Conference on Software Maintenance), IEEE Computer Society Press (1999) 99–108 22. Xing, Z., Stroulia, E.: Analyzing the evolutionary history of the logical design of objectoriented software. IEEE Trans. Software Eng. 31(10) (2005) 850–868 23. Gˆırba, T., Lanza, M.: Visualizing and characterizing the evolution of class hierarchies (2004) 24. Eick, S., Graves, T., Karr, A., Marron, J., Mockus, A.: Does code decay? assessing the evidence from change management data. IEEE Transactions on Software Engineering 27(1) (2001) 1–12 25. Gall, H., Hajek, K., Jazayeri, M.: Detection of logical coupling based on product release history. In: Proceedings International Conference on Software Maintenance (ICSM ’98), Los Alamitos CA, IEEE Computer Society Press (1998) 190–198 26. Murphy, G.C., Notkin, D., Sullivan, K.J.: Software reflexion models: Bridging the gap between design and implementation. IEEE Trans. Software Eng. 27(4) (2001) 364–380 27. Mens, T.: A state-of-the-art survey on software merging. IEEE Transactions on Software Engineering 28(5) (2002) 449–462 28. Lippe, E., van Oosterom, N.: Operation-based merging. In: SDE 5: Proceedings of the fifth ACM SIGSOFT symposium on Software development environments, New York, NY, USA, ACM Press (1992) 78–87

A Simulation-Oriented Formalization for a Psychological Theory Paulo Salem da Silva and Ana C. Vieira de Melo University of São Paulo Department of Computer Science São Paulo – Brazil [email protected], [email protected]

Abstract. In this paper we present a formal specification of a traditionally informal domain of knowledge: the Behavior Analysis psychological theory. Our main objective is to highlight some motivations, issues, constructions and insights that, we believe, are particular to the task of formalizing a preexisting informal theory. In order to achieve this, we give a short introduction to Behavior Analysis and then explore in detail some fragments of the full specification, which is written using the Z formal method. With such a specification, we argue, one is in better position to implement a software system that relates to an actual psychological theory. Such relation could be useful, for instance, in the implementation of multi-agent simulators.

1

Introduction

Mathematical approaches have been successful in representing the universe of natural sciences and engineering. Modern Physics is, perhaps, the greatest example of this success. Yet, many important fields of study remain distant from formal structures and reasoning. Among these, we regard Psychology as particularly interesting. Roughly speaking, Psychology is divided into several schools of thought, and each one adopts its own definitions, methods and goals. As examples, we may cite Psychoanalysis, Cognitivism and Behaviorism. The later is further divided into several approaches, out of which Behavior Analysis [1], created by Burrhus Frederic Skinner, stands out. While not strictly built on formal terms, it does bear some resemblance to them through detailed and precise definitions. As a consequence, it suggests the possibility of a complete formalization. With this in mind, we have designed a formal specification for agent behavior based on the Behavior Analysis theory. Its purpose is twofold. First, it should allow the construction of agent simulators following the principles of this psychological school. Second, it aims at demonstrating the possibility and the value, from a Software Engineering perspective, of formally specifying traditionally informal domains in order to build tools related to these domains. The specification of the Behavior Analysis theory has been written with the Z formal method [2], and this paper presents its fundamental structure, but M.B. Dwyer and A. Lopes (Eds.): FASE 2007, LNCS 4422, pp. 42–56, 2007. c Springer-Verlag Berlin Heidelberg 2007 

A Simulation-Oriented Formalization for a Psychological Theory

43

does not go deep into all details. Our aim, here, is to highlight some issues, constructions and insights that, we believe, are particular to the task of formalizing a preexisting informal domain of knowledge. Moreover, we hope that our presentation argue in favor of this kind of formalization. We are aware of some other works similar to ours either on their purposes or on their methods. A multi-agent specification framework written in Z, called SMART, can be found in [3]. One of the authors of this book is also involved in the formal modelling and simulation of stem cells [4]. Neuron models and simulations are common practice in the field of Computational Neuroscience [5,6]. We do not know, however, of attempts to formalize whole theories about organism behavior. Sect. 2 details the process through which our specification was conceived. Naturally, we assume that the reader is not familiar with Psychology. Therefore, Sect. 3 presents a brief introduction to the fundamental elements of Behavior Analysis. Sect. 4 explores some fragments of the specification in detail, using them to illustrate relevant points. We expect the reader to know the basics of the Z formal method, which can be learned in works such as [2] and [7]. Sect. 5 summarizes our main results and further elaborates on them. Finally, Sect. 5 acknowledges the help we received.

2

Formalization Process

Although the formalization process we employed is not precise, it does follow a number of principles and practices which are worth registering. In this section we present this knowledge, as structured as possible. Let us begin by tracing the two major steps that we went through, namely: 1. Definition of the main entities and relationships in the theory; 2. Addition of restrictions and further structure upon the entities and relationships. The first step allow us to identify the elements upon which we should focus. This, we believe, is specially important if the domain being formalized is not entirely understood. In our case, we initially built an ontology [8,9] for the concepts of Behavior Analysis as described by Skinner in the book Science and Human Behavior [1]. Among the techniques we employed to accomplish this stage, the most relevant ones are the following: – Map chapters or sections to subsystems. By doing this, we reused the general structure of the original theory; – Build the ontology as the book is read. We adopted the discipline of editing the ontology at the end of sections or chapters; – Register concepts in the ontology without structure and later organize them. This is important because sometimes it is not clear what a concept actually means or where it should be positioned in the ontology. As one gains more knowledge about the domain, it becomes simpler to organize the available concepts.

44

P. Salem da Silva and A.C. Vieira de Melo

In the second step, then, we can focus our attention on the details of each entity and relationship identified in the previous step. More expressive formalisms might be needed at this point. In our work, we employed the Z formal method to this end. Z was chosen in part because of our prior experience with it, but also owing to the method’s emphasis on axiomatic descriptions, refinement and modularization. Moreover, we used the Z/EVES tool [10] to help us write the specification. To gain a deeper understanding of the identified entities, we also began to study other references, specially the book Learning [11], written by Charles Catania, a well known contemporary psychologist. The formal specification, thus, is mostly structured according to the views of Skinner himself, though we have used a modern reference to improve our understanding of specific topics. At this point, we found the following practices to be useful: – Design subsystems to be as isolated as possible; – Try to express new things in terms of what is available. We found that once some base concepts are set, much can be expressed using them; – When defining an operation, try to account for all possible input cases. This helps spot conditions that have not being considered, either by the original theory, or by the formalization. We shall see an example of this in Sect. 4.2; – When a concept is not clear, leave it as abstract as necessary. By not trying to formalize what is not well understood, one avoids having to change the formalization later on; – When a concept may have multiple interpretations, provide an abstract definition followed by refinements that specialize it. We shall encounter an example of this in Sect. 4.2; – Do not attempt to formalize all details of the theory at once. In our experience, such ambition is doomed to failure, for the more details are added, the harder it gets to connect each part of the specification to the others. Such are the main practices we employed. In Sect. 4 we shall encounter some of them applied to an actual example.

3

A Brief Introduction to Behavior Analysis

We now present some fundamental ideas and elements of Behavior Analysis, upon which we have built our formal specification. Behaviorism is a branch of Psychology created in the beginning of the 20th century. It was born mainly as an opposition to the dominating idea that the objective of Psychology was the study of the mind. Behaviorists rejected this position, claiming that it was too vague and unsuitable for scientific investigation. They asserted that the true purpose of Psychology should be the study of the behavior of organisms, which, they thought, was a precise concept and, therefore, within the realm of natural science.1 1

See [12] for a classical exposition of these principles.

A Simulation-Oriented Formalization for a Psychological Theory

45

The Behaviorist tradition produced several important thinkers, from which Burrhus Frederic Skinner was, perhaps, the most notorious one. Between the decades of 1930 and 1950 he developed his own kind of Behaviorism, called Behavior Analysis. In Behavior Analysis, the fundamental object of study is the organism. Organisms perceive their environments through stimuli and act upon such environments through behavior. Further, a relation is assumed to exist between stimuli and behavior, in such a way that behavior is, ultimately, determined by the stimulation received by the organism. Thus, the purpose of this science is the prediction and control of behavior. This objective is pursued mainly through the classification of several phenomena concerning stimuli and behavior. The hope is that regularities can be discovered, leading to the formulation of behavioral laws. Let us first examine the ideas concerning stimulation, and then proceed to the points about behavior. Each stimulus has an utility value. That is, it is either pleasant or painful, desired or feared. Some stimuli, called primary, possess utility values a priori, independently of prior experience. All others, called conditioned, have their utilities determined by primary stimuli during the organism’s life. The relations between primary and conditioned stimuli are modified through the process named stimulus conditioning. Essentially, it is a learning process that tries to relate the occurrence of certain stimuli to the occurrence of others. In other words, it allows organisms to formulate causal laws about their environments. As an example, consider a dog that is always fed after a whistle. Initially, only the presentation of food can make the dog salivate. With time, however, the dog learns that the whistle is related to the food, causing him to salivate with the whistle, prior to any food delivery. In this case, food is the primary stimulus, since it is naturally pleasant to the dog. The whistle, on the other hand, is a conditioned stimulus, which becomes related to food. Stimulus conditioning also works the other way around. If the relation between two conditioned stimuli is not maintained, it tends to disappear. In the previous example, if the whistle is no longer followed by food, it is likely that, after some time, it won’t elicit salivation. Now let us proceed to the study of behavior. Behavior Analysis defines two main classes of behavior, namely, the class of reflexes and the class of operants. A reflex is characterized by an antecedent stimulus, which causes the organism to behave in some way. For instance, salivation is a reflex, since it is caused by the the presentation of food. Reflexes are innate to the organism. That is, they are not learning structures, they cannot be created nor modified in great extent. Operants, on the other hand, are far more flexible behavioral structures. An operant is defined by a consequent stimulus. The operant stands for the behavior that leads to this stimulus. That is, the behavior that operates in the environment in order to generate the stimulus. Notice that if a behavior no longer takes to a stimulus, or if the behavior required to reach that stimulus changes, the operant changes as well. They are, therefore, learning structures. As an example, suppose that a dog learns that the push of a button brings

46

P. Salem da Silva and A.C. Vieira de Melo

food to him. Then this button pushing behavior becomes an operant, for it is associated with a specific consequent stimulus. It is through operant behavior that the most interesting issues arise in Behavior Analysis. Organisms can have their behavior changed by operations of reinforcement and punishment. Reinforcement is the presentation of pleasant stimuli as a reward for particular behaviors. Punishment, in turn, accounts for the presentation of unpleasant stimuli, in order to inhibit specific behaviors. There are many ways to perform these operations, called schedules of reinforcement. Each schedule modifies behavior in a distinct way. There are other interesting concepts, but we shall limit ourselves to these, for they are sufficient to understand the examples that come in the next section. Moreover, most of the concepts discussed above are present explicitly in our specification. And how a simulator based on it could be useful? Once we define an organism, we can perform simulations to determine properties like: – How frequent should reinforcement be in order to preserve behaviors of interest; – How much time it takes to teach the organism a new behavior. In general, simulations could replace some experiments usually done with real animals.

4

Results

As stated above, the specification is too large to be completely described in this paper. Therefore, in this section we do not present the whole specification, but some of its most significant parts, from which useful discussion can be drawn. Some schemata used might not be defined for this reason. Sect. 4.1 gives an overview of the specification’s general structure, while Sect. 4.2 explores some of its most instructive parts in detail. 4.1

Specification Overview

The formalization’s main goal is to allow the construction of a system that simulates the behavior of organisms according to the principles of Behavior Analysis. It is natural, therefore, to build a specification centered around the concept of “organism”. The main object of our specification is an isolated organism, which receives stimuli from an environment and produces behavioral responses. It is modelled as a state machine according to the following principles: – – – –

Time is discrete; At every instant, the state of the organism may change; At every instant, the organism may receive one stimulus; At every instant, the organism may produce a new behavioral response.

A Simulation-Oriented Formalization for a Psychological Theory

47

Changes in the state of the organism are given either spontaneously or as consequences of stimulation. These changes are controlled by several mechanisms, which we have divided into subsystems. Each subsystem is responsible for a particular aspect of behavior and is closely related to major concepts in the psychological theory. Thus, formally, an organism is a composition of several subsystems, as the following schema shows. Organism stimulationSubsystem : StimulationSubsystem respondingSubsystem : RespondingSubsystem driveSubsystem : DriveSubsystem emotionSubsystem : EmotionSubsystem At every instant, the organism may receive a new stimulus, which is processed by all subsystems in no particular order. How these stimuli are generated or how the organism’s behavior changes the environment is out of the specification’s scope. Nevertheless, we do provide a simple definition of the simulation process with the following schema. Simulator organism : Organism currentInstant : Instant 4.2

Specification: Main Elements

Let us now proceed to the detailed examination of some parts of the specification. In what follows, we first explores some of the stimulation subsystem, and then give some details of operant behavior, defined in the responding subsystem. Stimulation. The specification of stimulus processing is particularly suitable for the discussion of how traditional mathematical structures, such as graphs, can be used in formalization processes. The fact that these phenomena can be translated to well studied formal structures sheds new light on them. It allows us to consider possibilities that could have remained hidden prior to the formalization. We begin by giving the main stimulation subsystem definition. StimulationSubsystem StimulationParameters StimulusImplication StimulusEquivalence currentStimuli : P Stimulus stimulus status : Stimulus → StimulusStatus

48

P. Salem da Silva and A.C. Vieira de Melo

Consider the several schema imports above. The first, StimulationParameters, merely defines the parameters that are given as input to the simulation. They define what is particular, a priori, to the organism being simulated. We shall not pursuit it in detail here. Our interest is in the other two, StimulusImplication and StimulusEquivalence. They carry the fundamental definitions that allow the formalization of stimulus conditioning operation. As we pointed out earlier, such operation allows organisms to learn about how their environment works. Let us first examine it informally and then, upon that, build a formal definition. The behavior of organisms depends greatly on their power to learn how environmental stimuli are related. Sometimes, it is useful to consider two stimuli that are, in reality, different, to be equivalent. For example, if, through experimental procedures, we arrange that both the presence of a red light and of a green light are always followed by the same consequences (e.g., food), why should a hungry organism bother to distinguish between the colors? As far as the organism is concerned, the two lights are equivalent. On the other hand, sometimes the appropriate relation is one that defines causality, not equivalence. In the previous example, we may arrange the procedure so that the red light is always followed by food. In this case, the learning takes the order of stimulation into account: though red light is followed by food, food is not necessarily followed by a red light. That is, the organism may establish an implication between red light and food. We now proceed to the formalization of these ideas. Notice that causal laws are certainly reflexive, since a stimulus trivially causes itself. They are also transitive, in the sense that causality can be chained (e.g., stimulus s1 causes s2 which, in turn, causes s3 ). Finally, in principle no symmetry is needed (e.g., if s1 causes s2 , there is no need, at first, for s2 to cause s1 ). We are now in position to specify causality in the StimulusImplication schema. It also defines a function called sCorrelation, which accounts for the fact that some implications may be stronger than others. StimulusImplication sCauses : P(Stimulus × Stimulus) sCorrelation : Stimulus × Stimulus → Correlation ∀ s1 , s2 , s3 : Stimulus • (s1 sCauses s1 ) ∧ (((s1 sCauses s2 ) ∧ (s2 sCauses s3 )) ⇒ (s1 sCauses s3 )) ∀ s1 , s2 : Stimulus | s1 sCauses s2 • ∃ c : Correlation • ((s1 , s2 ) → c) ∈ sCorrelation Stimulus equivalence relations, in turn, can be defined in terms of stimulus implication. We merely add the symmetry axiom and require the sCorrelation function to have the same value in both directions.

A Simulation-Oriented Formalization for a Psychological Theory

49

StimulusEquivalence StimulusImplication equals : P(Stimulus × Stimulus) ∀ s1 , s2 : Stimulus • (s1 equals s2 ) ⇔ (s1 sCauses s2 ) ∧ (s2 sCauses s1 ) ∀ s1 , s2 : Stimulus | s1 equals s2 • sCorrelation(s1 , s2 ) = sCorrelation(s2 , s1 ) With this, we have achieved a formal specification for the relations among stimuli. But we may continue our analysis, casting this specification in other terms. Notice that stimulus implication may be regarded as a directed graph (Fig. 1(a)), in which vertices represent stimuli and edges are the conditioning between stimuli. Similarly, stimulus equivalence can also be seen as a graph (Fig. 1(b)), but undirected. Furthermore, edges in both graphs might have weight, if the correlation of the conditioning is to be taken into account.

Fig. 1. (a) An example of stimulus implication represented as a directed graph; (b) An example of stimulus equivalence represented as an undirected graph

Regarding this stimuli graph, new psychological questions arise. In fact, we can use all our knowledge of Graph Theory and search algorithms to formulate questions, bringing new light to the psychological theory itself. For instance, consider the following: – When looking for causal relations, which search strategy do organisms employ? Do they execute a depth- or breadth-first search? – How deep can a search go? Is there some sort of memory limitation that prevents it from being exaustive? Answers to these questions, of course, are left to psychologists. We must, however, model this lack of knowledge somehow. Fortunately, the Z formal method allows us to do this easily, as follows. For all operations that deal with stimulus

50

P. Salem da Silva and A.C. Vieira de Melo

implication and equivalence, we first define a more abstract version, containing only axioms that we are sure to hold. Then we provide one or more refinements that add assumptions to it. This allows experimentation with several possibilities and makes it easier to update the specification as we learn more about psychological phenomena.2 As an example, let us consider the schemata that specify how the utility of a stimulus is calculated. Recall from Sect. 3 that stimuli are divided into two classes, namely, primary and conditioned. Primary stimuli have utility values a priori, while conditioned stimuli have their utilities calculated in terms of the primary ones. Moreover, drives and emotions can influence this calculation. The more general version of stimulus utility, StimulusUtility, states that there exists a function that calculates the utility in terms of the stimulus, a set of emotions and a set of drives. StimulusUtility StimulationSubsystem EmotionSubsystem DriveSubsystem sUtility : Stimulus → Utility ∃ f : Stimulus × P Emotion × P Drive → Utility • ∀ s : Stimulus • sUtility(s) = f (s, activeEmotions, activeDrives) Clearly, this abstract definition does not relate conditioned to primary stimuli. The reason is that, as far as we can see, any such relation must contain assumptions that we are not sure to hold. Thus, the actual relation is given in refinements. A simple one is given by StimulusUtility Ref 1 schema, which depends on another schema, StimulusUtilityBase. In this refinement, the calculation is performed by locating the best primary stimulus that can be reached through stimulus implication, and then applying emotional and driving filters. StimulusUtility Ref 1 StimulusUtilityBase StimulusEmotionalRegulator StimulusDriveRegulator ∀ s : Stimulus • sUtility(s) = driveRegulator (s, emotionalRegulator (s, base(s)))

2

Notice that if the specification is implemented in an object-oriented language, this approach can be seen in terms of class inheritance.

A Simulation-Oriented Formalization for a Psychological Theory

51

StimulusUtilityBase StimulusUtility StimulusImplication base : Stimulus → Utility ∀ s : Stimulus • (∃ p : primaryStimuli • base(s) = primary utility(p) ∧ (∀ q : primaryStimuli | s sCauses q • primary utility(p) ≥1 primary utility(q) ∧ (s sCauses p))) ∨ (∀ p : primaryStimuli • ¬ (s sCauses p) ∧ sUtility(s) = neutral ) In the next section we shall make references to some of the entities presented here in order to show how different subsystems are related. Operant Behavior. Operant behavior, as we have seen in Sect. 3, is the most important behavioral class within Behavior Analysis. We shall study it here from two perspectives. First, its formalization is not straightforward, and we shall examine some of the difficulties. Second, operant processing is not simple, but can be elegantly modeled to some extent. Let us begin by defining an operant. Operant StimulusUtility antecedents : P(P Stimulus) action : Action consequence : Stimulus consequenceContingency : (P Stimulus) →  Correlation sUtility(consequence) = neutral ∅ ∈ antecedents dom consequenceContingency = antecedents The above schema states that an operant has an action which leads to a consequence. There are two important considerations to be made here. First, notice that we introduced the concept of action. From the study of Behavior Analysis, we realized that there are some terminological imprecisions; a behavior (i.e., what is actually performed by the organism) and a behavior class (i.e., a set containing behaviors that have some properties) are distinct concepts, but

52

P. Salem da Silva and A.C. Vieira de Melo

it is easy to confuse them. Thus, we adopted the notion of action to refer to what would traditionally be called a behavior or even a mechanical property of behavior. The second consideration regards the fact that Behavior Analysis defines operants solely by a stimulus consequence. Thus, in principle, either no action should be defined within an operant, or all possible actions that lead to the consequence should be present. This approach, however, would neglect the fact that each action takes to the consequent stimulus in a different way. For instance, pushing either a red button or a green one might lead an animal to food. But, perhaps, the red button is more efficient and, hence, will be more strongly correlated with the consequence than the other. The schema also defines a set of sets of stimuli, antecedents. This accounts for the fact that the stimuli currently present in the environment might change the chances of reaching the desired consequence. This is formalized by the function consequenceContingency, which takes antecedent stimuli to the probability of success. Such details show that a formalization process is not just a matter of translation. Sometimes it is necessary to add notions and to infer, from unclear prose, what was actually meant. We now move on to study some operations. In Z, we say that an operation is total if, and only if, its preconditions cover all possibilities. This concept will guide our analysis from here on. Operants might be either created or modified. Here, we shall focus on operant modification, which can be achieved in four ways. First, a new environmental condition might be learned. This is called a discrimination operation, for it allows the organism to discriminate among several environmental possibilities. Each possibility is defined by a set of discriminative stimuli. DiscriminationOp OperantOp discriminativeStimuli? ∈ / dom consequenceContingency consequence? sCauses consequence discriminativeStimuli? ∈ dom consequenceContingency  consequenceContingency  (discriminativeStimuli?) >1 min correlation In the above schema we import OperantOp, which defines a general operation over an operant but is not necessary for the present discussion and, thus, is omitted. Second, an already known environmental condition might lead to the operant consequent stimulus, which strengthens their relation.

A Simulation-Oriented Formalization for a Psychological Theory

53

OperantConditioningOp OperantOp discriminativeStimuli? ∈ dom consequenceContingency consequence? sCauses consequence consequenceContingency  (discriminativeStimuli?) ≥1 consequenceContingency(discriminativeStimuli?) Third, a known environmental state might not lead to the desired consequence, which reduces their relation. ExtinctionOp OperantOp discriminativeStimuli? ∈ dom consequenceContingency ¬ (consequence? sCauses consequence) consequenceContingency  (discriminativeStimuli?) ≤1 consequenceContingency(discriminativeStimuli?) Finally, if neither the environmental condition is known, nor the consequence desired, the operant simply remains unchanged. NeutralOp OperantOp discriminativeStimuli? ∈ / dom consequenceContingency ¬ (consequence? sCauses consequence) consequenceContingency  (discriminativeStimuli?) = consequenceContingency(discriminativeStimuli?) Notice that these four definitions form a total operation: they cover all possibilities for the input variables discriminativeStimuli? and consequence?: 1. DiscriminationOp accounts for the case in which discriminativeStimuli? ∈ dom consequenceContingency. 2. OperantConditioningOp handles the case in which discriminativeStimuli? ∈ dom consequenceContingency and consequence? sCauses consequence. 3. ExtinctionOp occurs when discriminativeStimuli? ∈ dom consequence Contingency and ¬ (consequence? sCauses consequence). 4. NeutralOp accounts for the remaining case. This model can be further refined by adding the notions of reinforcement and punishment. Each of these, in turn, can be either positive or negative. A positive reinforcement accounts for the provision of a pleasant stimulus (e.g., provision

54

P. Salem da Silva and A.C. Vieira de Melo

of food), while a negative reinforcement stands for the removal of an unpleasant stimulus (e.g., relief from pain through analgesics). Punishment is analogous. At last, there is the case in which the stimulus is neither pleasant nor painful. Hence, there are five possibilities. PositiveReinforcement StimulusUtility consequence? : Stimulus sUtility(consequence?) >1 neutral stimulus status(consequence?) = Beginning

NegativeReinforcement StimulusUtility consequence? : Stimulus sUtility(consequence?)