416 76 5MB
English Pages 1347 Year 2008
COMPUTER SCIENCE/PROGRAMMING LANGUAGES
TURBAK,GIFFORD, AND SHELDON
Design Concepts in Programming Languages FRANKLYN TURBAK AND DAVID GIFFORD WITH MARK A. SHELDON
On the cover is an inuksuk, a signpost used by the Inuit in the Arctic to provide guidance in vast wilderness. The concise semantics, type rules, effect rules, and compilation transforms in this book have been valuable inuksuit to the authors in the programming language landscape.
“There is a paucity of good graduate-level textbooks on the foundations of programming languages, no more than four or five in the last two decades. Nothing to compare with the profusion of excellent texts in the other core areas of computer science, such as algorithms or operating systems. This new textbook by Franklyn Turbak, David Gifford, and Mark Sheldon—comprehensive, thorough, pedagogically innovative, impeccably written and organized—greatly enriches the area of programming languages and will be an important reference for years to come.” Assaf Kfoury Department of Computer Science, Boston University
“This book is an excellent, systematic exploration of ideas and techniques in programming language theory. The book carefully, but without wasting time on extraneous complications, explains operational and denotational semantic techniques and their application to many aspects of programming language design. It will be of great value for graduate courses and for independent study.” Gary T. Leavens School of Electrical Engineering and Computer Science, University of Central Florida
978-0-262-20175-9
THE MIT PRESS MASSACHUSETTS INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASSACHUSETTS 02142 HTTP://MITPRESS.MIT.EDU
FRANKLYN TURBAK AND DAVID GIFFORD WITH MARK A. SHELDON
MD DALIM #970524 6/24/08 CYAN MAG YELO BLK
COVER PHOTOGRAPH: DAVID GIFFORD
Franklyn Turbak is Associate Professor in the Computer Science Department at Wellesley College. David Gifford is Professor of Computer Science and Engineering at MIT. Mark A. Sheldon is Visiting Assistant Professor in the Computer Science Department at Wellesley College.
Design Concepts in Programming Languages
Hundreds of programming languages are in use today—scripting languages for Internet commerce, user interface programming tools, spreadsheet macros, page format specification languages, and many others. Designing a programming language is a metaprogramming activity that bears certain similarities to programming in a regular language, with clarity and simplicity even more important than in ordinary programming. This comprehensive text uses a simple and concise framework to teach key ideas in programming language design and implementation. The book’s unique approach is based on a family of syntactically simple pedagogical languages that allow students to explore programming language concepts systematically. It takes as its premise and starting point the idea that when language behaviors become incredibly complex, the description of the behaviors must be incredibly simple. The book presents a set of tools (a mathematical metalanguage, abstract syntax, operational and denotational semantics) and uses it to explore a comprehensive set of programming language design dimensions, including dynamic semantics (naming, state, control, data), static semantics (types, type reconstruction, polymorphism, effects), and pragmatics (compilation, garbage collection). The many examples and exercises offer students opportunities to apply the foundational ideas explained in the text. Specialized topics and code that implements many of the algorithms and compilation methods in the book can be found on the book’s Web site, along with such additional material as a section on concurrency and proofs of the theorems in the text. The book is suitable as a text for an introductory graduate or advanced undergraduate programming languages course; it can also serve as a reference for researchers and practitioners.
Design Concepts in Programming Languages
Design Concepts in Programming Languages
Design Concepts in Programming Languages
Franklyn Turbak and David Gifford with Mark A. Sheldon
The MIT Press Cambridge, Massachusetts London, England
c 2008 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For information, please email special [email protected] or write to Special Sales Department, The MIT Press, 55 Hayward Street, Cambridge, MA 02142. This book was set in LATEX by the authors, and was printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Turbak, Franklyn A. Design concepts in programming languages / Franklyn A. Turbak and David K. Gifford, with Mark A. Sheldon. p. cm. Includes bibliographical references and index. ISBN 978-0-262-20175-9 (hardcover : alk. paper) 1. Programming languages (Electronic computers). I. Gifford, David K., 1954–. II. Sheldon, Mark A. III. Title. QA76.7.T845 2008 005.1—dc22 2008013841
10 9 8 7 6 5 4 3 2 1
Brief Contents Preface
xix
Acknowledgments
I
Foundations
xxi
1
1 Introduction 2 Syntax
3
19
3 Operational Semantics
45
4 Denotational Semantics 5 Fixed Points
II
113
163
Dynamic Semantics
205
6 FL: A Functional Language 7 Naming 8 State
III
307
383
9 Control 10 Data
207
443
539
Static Semantics
11 Simple Types
615
617
12 Polymorphism and Higher-order Types 13 Type Reconstruction 14 Abstract Types 15 Modules
769
839
889
16 Effects Describe Program Behavior
IV
Pragmatics
17 Compilation
1003 1005
18 Garbage Collection A A Metalanguage
1119
1147
B Our Pedagogical Languages References Index
1227
1199
1197
943
701
Contents Preface
xix
Acknowledgments
I
Foundations
xxi
1
1 Introduction 3 1.1 Programming Languages 3 1.2 Syntax, Semantics, and Pragmatics 4 1.3 Goals 6 1.4 PostFix: A Simple Stack Language 8 1.4.1 Syntax 8 1.4.2 Semantics 9 1.4.3 The Pitfalls of Informal Descriptions 14 1.5 Overview of the Book 15 2 Syntax 19 2.1 Abstract Syntax 20 2.2 Concrete Syntax 22 2.3 S-Expression Grammars Specify ASTs 23 2.3.1 S-Expressions 23 2.3.2 The Structure of S-Expression Grammars 24 2.3.3 Phrase Tags 30 2.3.4 Sequence Patterns 30 2.3.5 Notational Conventions 32 2.3.6 Mathematical Foundation of Syntactic Domains 2.4 The Syntax of PostFix 39 3 Operational Semantics 45 3.1 The Operational Semantics Game 45 3.2 Small-step Operational Semantics (SOS) 49 3.2.1 Formal Framework 49 3.2.2 Example: An SOS for PostFix 52 3.2.3 Rewrite Rules 54 3.2.4 Operational Execution 58
36
viii
Contents
3.3 3.4 3.5 3.6
3.7
3.8
3.2.5 Progress Rules 62 3.2.6 Context-based Semantics 71 Big-step Operational Semantics 75 Operational Reasoning 79 Deterministic Behavior of EL 80 Termination of PostFix Programs 84 3.6.1 Energy 84 3.6.2 The Proof of Termination 86 3.6.3 Structural Induction 88 Safe PostFix Transformations 89 3.7.1 Observational Equivalence 89 3.7.2 Transform Equivalence 92 3.7.3 Transform Equivalence Implies Observational Equivalence 96 Extending PostFix 100
4 Denotational Semantics 113 4.1 The Denotational Semantics Game 113 4.2 A Denotational Semantics for EL 117 4.2.1 Step 1: Restricted ELMM 117 4.2.2 Step 2: Full ELMM 120 4.2.3 Step 3: ELM 124 4.2.4 Step 4: EL 127 4.2.5 A Denotational Semantics Is Not a Program 128 4.3 A Denotational Semantics for PostFix 131 4.3.1 A Semantic Algebra for PostFix 131 4.3.2 A Meaning Function for PostFix 134 4.3.3 Semantic Functions for PostFix: the Details 142 4.4 Denotational Reasoning 145 4.4.1 Program Equality 145 4.4.2 Safe Transformations: A Denotational Approach 147 4.4.3 Technical Difficulties 150 4.5 Relating Operational and Denotational Semantics 150 4.5.1 Soundness 151 4.5.2 Adequacy 157 4.5.3 Full Abstraction 159 4.5.4 Operational versus Denotational: A Comparison 161
Contents
ix
5 Fixed Points 163 5.1 The Fixed Point Game 163 5.1.1 Recursive Definitions 163 5.1.2 Fixed Points 166 5.1.3 The Iterative Fixed Point Technique 168 5.2 Fixed Point Machinery 174 5.2.1 Partial Orders 174 5.2.2 Complete Partial Orders (CPOs) 182 5.2.3 Pointedness 185 5.2.4 Monotonicity and Continuity 187 5.2.5 The Least Fixed Point Theorem 190 5.2.6 Fixed Point Examples 191 5.2.7 Continuity and Strictness 197 5.3 Reflexive Domains 201 5.4 Summary 203
II
Dynamic Semantics
205
6 FL: A Functional Language 207 6.1 Decomposing Language Descriptions 207 6.2 The Structure of FL 208 6.2.1 FLK: The Kernel of the FL Language 209 6.2.2 FL Syntactic Sugar 218 6.2.3 The FL Standard Library 235 6.2.4 Examples 239 6.3 Variables and Substitution 244 6.3.1 Terminology 244 6.3.2 Abstract Syntax DAGs and Stoy Diagrams 248 6.3.3 Alpha-Equivalence 250 6.3.4 Renaming and Variable Capture 251 6.3.5 Substitution 253 6.4 An Operational Semantics for FLK 258 6.4.1 FLK Evaluation 258 6.4.2 FLK Simplification 270 6.5 A Denotational Semantics for FLK 275 6.5.1 Semantic Algebra 275 6.5.2 Valuation Functions 280 6.6 The Lambda Calculus 290
x
Contents 6.6.1 6.6.2 6.6.3 6.6.4
Syntax of the Lambda Calculus 291 Operational Semantics of the Lambda Calculus 291 Denotational Semantics of the Lambda Calculus 296 Representational Games 297
7 Naming 307 7.1 Parameter Passing 309 7.1.1 Call-by-Name vs. Call-by-Value: The Operational View 310 7.1.2 Call-by-Name vs. Call-by-Value: The Denotational View 316 7.1.3 Nonstrict versus Strict Pairs 318 7.1.4 Handling rec in a CBV Language 320 7.1.5 Thunking 324 7.1.6 Call-by-Denotation 328 7.2 Name Control 332 7.2.1 Hierarchical Scoping: Static and Dynamic 334 7.2.2 Multiple Namespaces 347 7.2.3 Nonhierarchical Scope 352 7.3 Object-oriented Programming 362 7.3.1 HOOK: An Object-oriented Kernel 362 7.3.2 HOOPLA 368 7.3.3 Semantics of HOOK 370 8 State 383 8.1 FL Is a Stateless Language 384 8.2 Simulating State in FL 390 8.2.1 Iteration 390 8.2.2 Single-Threaded Data Flow 392 8.2.3 Monadic Style 394 8.2.4 Imperative Programming 397 8.3 Mutable Data: FLIC 397 8.3.1 Mutable Cells 397 8.3.2 Examples of Imperative Programming 400 8.3.3 An Operational Semantics for FLICK 405 8.3.4 A Denotational Semantics for FLICK 411 8.3.5 Call-by-Name versus Call-by-Value Revisited 425 8.3.6 Referential Transparency, Interference, and Purity 427 8.4 Mutable Variables: FLAVAR 429 8.4.1 Mutable Variables 429 8.4.2 FLAVAR 430 8.4.3 Parameter-passing Mechanisms for FLAVAR 432
Contents
xi
9 Control 443 9.1 Motivation: Control Contexts and Continuations 443 9.2 Using Procedures to Model Control 446 9.2.1 Representing Continuations as Procedures 446 9.2.2 Continuation-Passing Style (CPS) 449 9.2.3 Multiple-value Returns 450 9.2.4 Nonlocal Exits 455 9.2.5 Coroutines 457 9.2.6 Error Handling 461 9.2.7 Backtracking 465 9.3 Continuation-based Semantics of FLICK 471 9.3.1 A Standard Semantics of FLICK 472 9.3.2 A Computation-based Continuation Semantics of FLICK 482 9.4 Nonlocal Exits 493 9.4.1 label and jump 494 9.4.2 A Denotational Semantics for label and jump 497 9.4.3 An Operational Semantics for label and jump 503 9.4.4 call-with-current-continuation (cwcc) 505 9.5 Iterators: A Simple Coroutining Mechanism 506 9.6 Exception Handling 513 9.6.1 raise, handle, and trap 515 9.6.2 A Standard Semantics for Exceptions 519 9.6.3 A Computation-based Semantics for Exceptions 524 9.6.4 A Desugaring-based Implementation of Exceptions 527 9.6.5 Examples Revisited 530 10 Data 539 10.1 Products 539 10.1.1 Positional Products 541 10.1.2 Named Products 549 10.1.3 Nonstrict Products 551 10.1.4 Mutable Products 561 10.2 Sums 567 10.3 Sum of Products 577 10.4 Data Declarations 583 10.5 Pattern Matching 590 10.5.1 Introduction to Pattern Matching 590 10.5.2 A Desugaring-based Semantics of match 10.5.3 Views 605
594
xii
III
Contents
Static Semantics
615
11 Simple Types 617 11.1 Static Semantics 617 11.2 What Is a Type? 620 11.3 Dimensions of Types 622 11.3.1 Dynamic versus Static Types 623 11.3.2 Explicit versus Implicit Types 625 11.3.3 Simple versus Expressive Types 627 11.4 μFLEX: A Language with Explicit Types 628 11.4.1 Types 629 11.4.2 Expressions 631 11.4.3 Programs and Syntactic Sugar 634 11.4.4 Free Identifiers and Substitution 636 11.5 Type Checking in μFLEX 640 11.5.1 Introduction to Type Checking 640 11.5.2 Type Environments 643 11.5.3 Type Rules for μFLEX 645 11.5.4 Type Derivations 648 11.5.5 Monomorphism 655 11.6 Type Soundness 661 11.6.1 What Is Type Soundness? 661 11.6.2 An Operational Semantics for μFLEX 662 11.6.3 Type Soundness of μFLEX 667 11.7 Types and Strong Normalization 673 11.8 Full FLEX: Typed Data and Recursive Types 675 11.8.1 Typed Products 675 11.8.2 Type Equivalence 679 11.8.3 Typed Mutable Data 681 11.8.4 Typed Sums 682 11.8.5 Typed Lists 685 11.8.6 Recursive Types 688 11.8.7 Full FLEX Summary 696 12 Polymorphism and Higher-order Types 701 12.1 Subtyping 701 12.1.1 FLEX/S: FLEX with Subtyping 702 12.1.2 Dimensions of Subtyping 713 12.1.3 Subtyping and Inheritance 723 12.2 Polymorphic Types 725
Contents
xiii
12.2.1 Monomorphic Types Are Not Expressive 725 12.2.2 Universal Polymorphism: FLEX/SP 727 12.2.3 Deconstructible Data Types 738 12.2.4 Bounded Quantification 745 12.2.5 Ad Hoc Polymorphism 748 12.3 Higher-order Types: Descriptions and Kinds 750 12.3.1 Descriptions: FLEX/SPD 750 12.3.2 Kinds and Kind Checking: FLEX/SPDK 758 12.3.3 Discussion 764 13 Type Reconstruction 769 13.1 Introduction 769 13.2 μFLARE: A Language with Implicit Types 772 13.2.1 μFLARE Syntax and Type Erasure 772 13.2.2 Static Semantics of μFLARE 774 13.2.3 Dynamic Semantics and Type Soundness of μFLARE 778 13.3 Type Reconstruction for μFLARE 781 13.3.1 Type Substitutions 781 13.3.2 Unification 783 13.3.3 The Type-Constraint-Set Abstraction 787 13.3.4 A Reconstruction Algorithm for μFLARE 790 13.4 Let Polymorphism 801 13.4.1 Motivation 801 13.4.2 A μFLARE Type System with Let Polymorphism 803 13.4.3 μFLARE Type Reconstruction with Let Polymorphism 808 13.5 Extensions 813 13.5.1 The Full FLARE Language 813 13.5.2 Mutable Variables 820 13.5.3 Products and Sums 821 13.5.4 Sum-of-products Data Types 826 14 Abstract Types 839 14.1 Data Abstraction 839 14.1.1 A Point Abstraction 840 14.1.2 Procedural Abstraction Is Not Enough 14.2 Dynamic Locks and Keys 843 14.3 Existential Types 847 14.4 Nonce Types 859 14.5 Dependent Types 869 14.5.1 A Dependent Package System 870
841
xiv
Contents 14.5.2 Design Issues with Dependent Types
877
15 Modules 889 15.1 An Overview of Modules and Linking 889 15.2 An Introduction to FLEX/M 891 15.3 Module Examples: Environments and Tables 901 15.4 Static Semantics of FLEX/M Modules 910 15.4.1 Scoping 910 15.4.2 Type Equivalence 911 15.4.3 Subtyping 912 15.4.4 Type Rules 912 15.4.5 Implicit Projection 918 15.4.6 Typed Pattern Matching 921 15.5 Dynamic Semantics of FLEX/M Modules 923 15.6 Loading Modules 925 15.6.1 Type Soundness of load via a Load-Time Check 927 15.6.2 Type Soundness of load via a Compile-Time Check 928 15.6.3 Referential Transparency of load for File-Value Coherence 930 15.7 Discussion 932 15.7.1 Scoping Limitations 932 15.7.2 Lack of Transparent and Translucent Types 933 15.7.3 The Coherence Problem 934 15.7.4 Purity Issues 937 16 Effects Describe Program Behavior 943 16.1 Types, Effects, and Regions: What, How, and Where 943 16.2 A Language with a Simple Effect System 945 16.2.1 Types, Effects, and Regions 945 16.2.2 Type and Effect Rules 951 16.2.3 Reconstructing Types and Effects: Algorithm Z 959 16.2.4 Effect Masking Hides Unobservable Effects 972 16.2.5 Effect-based Purity for Generalization 974 16.3 Using Effects to Analyze Program Behavior 978 16.3.1 Control Transfers 978 16.3.2 Dynamic Variables 983 16.3.3 Exceptions 985 16.3.4 Execution Cost Analysis 988 16.3.5 Storage Deallocation and Lifetime Analysis 991 16.3.6 Control Flow Analysis 995 16.3.7 Concurrent Behavior 996
Contents
xv 16.3.8 Mobile Code Security
IV
Pragmatics
999
1003
17 Compilation 1005 17.1 Why Do We Study Compilation? 1005 17.2 Tortoise Architecture 1007 17.2.1 Overview of Tortoise 1007 17.2.2 The Compiler Source Language: FLARE/V 1009 17.2.3 Purely Structural Transformations 1012 17.3 Transformation 1: Desugaring 1013 17.4 Transformation 2: Globalization 1014 17.5 Transformation 3: Assignment Conversion 1019 17.6 Transformation 4: Type/Effect Reconstruction 1025 17.6.1 Propagating Type and Effect Information 1026 17.6.2 Effect-based Code Optimization 1026 17.7 Transformation 5: Translation 1030 17.7.1 The Compiler Intermediate Language: FIL 1030 17.7.2 Translating FLARE to FIL 1036 17.8 Transformation 6: Renaming 1038 17.9 Transformation 7: CPS Conversion 1042 17.9.1 The Structure of Tortoise CPS Code 1044 17.9.2 A Simple CPS Transformation 1049 17.9.3 A More Efficient CPS Transformation 1058 17.9.4 CPS-Converting Control Constructs 1070 17.10 Transformation 8: Closure Conversion 1075 17.10.1 Flat Closures 1076 17.10.2 Variations on Flat Closure Conversion 1085 17.10.3 Linked Environments 1090 17.11 Transformation 9: Lifting 1094 17.12 Transformation 10: Register Allocation 1098 17.12.1 The FILreg Language 1098 17.12.2 A Register Allocation Algorithm 1102 17.12.3 The Expansion Phase 1104 17.12.4 The Register Conversion Phase 1104 17.12.5 The Spilling Phase 1112
xvi 18 Garbage Collection 1119 18.1 Why Garbage Collection? 1119 18.2 FRM: The FIL Register Machine 1122 18.2.1 The FRM Architecture 1122 18.2.2 FRM Descriptors 1123 18.2.3 FRM Blocks 1127 18.3 A Block Is Dead if It Is Unreachable 1130 18.3.1 Reference Counting 1131 18.3.2 Memory Tracing 1132 18.4 Stop-and-copy GC 1133 18.5 Garbage Collection Variants 1141 18.5.1 Mark-sweep GC 1141 18.5.2 Tag-free GC 1141 18.5.3 Conservative GC 1142 18.5.4 Other Variations 1142 18.6 Static Approaches to Automatic Deallocation
Contents
1144
A A Metalanguage 1147 A.1 The Basics 1147 A.1.1 Sets 1148 A.1.2 Boolean Operators and Predicates 1151 A.1.3 Tuples 1152 A.1.4 Relations 1153 A.2 Functions 1155 A.2.1 What Is a Function? 1156 A.2.2 Application 1158 A.2.3 More Function Terminology 1159 A.2.4 Higher-order Functions 1160 A.2.5 Multiple Arguments and Results 1161 A.2.6 Lambda Notation 1165 A.2.7 Recursion 1168 A.2.8 Lambda Notation Is Not Lisp! 1169 A.3 Domains 1171 A.3.1 Motivation 1171 A.3.2 Types 1172 A.3.3 Product Domains 1173 A.3.4 Sum Domains 1176 A.3.5 Sequence Domains 1181 A.3.6 Function Domains 1184
Contents A.4
xvii Metalanguage Summary 1186 A.4.1 The Metalanguage Kernel 1186 A.4.2 The Metalanguage Sugar 1188
B Our Pedagogical Languages References Index
1227
1199
1197
Preface This book is the text for 6.821 Programming Languages, an entry-level, singlesemester, graduate-level course at the Massachusetts Institute of Technology. The students that take our course know how to program and are mathematically inclined, but they typically have not had an introduction to programming language design or its mathematical foundations. We assume a reader with similar preparation, and we include an appendix that completely explains the mathematical metalanguage we use. Many of the exercises are taken directly from our problem sets and examination questions, and have been specifically designed to cause students to apply their newfound knowledge to practical (and sometimes impractical!) extensions to the foundational ideas taught in the course. Our fundamental goal for Programming Languages is to use a simple and concise framework to teach key ideas in programming language design and implementation. We specifically eschewed an approach based on a tour of the great programming languages. Instead, we have adopted a family of syntactically simple pedagogical languages that systematically explore programming language concepts (see Appendix B). Contemporary concerns about safety and security have caused programmers to migrate to languages that embrace many of the key ideas that we explain. Where appropriate, we discuss how the ideas we introduce have been incorporated into contemporary programming languages that are in wide use. We use an s-expression syntax for programs because this syntactic form is easy to parse and to directly manipulate, key attributes that support our desire to make everything explicit in our descriptions of language semantics and pragmatics. While you may find s-expression syntax unfamiliar at first, it permits the unambiguous and complete articulation of ideas in a simple framework. Programming languages are a plastic and expressive medium, and we are hopeful that we will communicate our passion for these computational canvases that are an important underpinning for computer science.
Web Supplement Specialized topics and code that implements many of the algorithms and compilation methods can be found on our accompanying Web site: dcpl.mit.edu
xx
Preface
The Web Supplement also includes additional material, such as a section on concurrency and proofs of the theorems stated in the book.
To the Student The book is full of examples, and a good way to approach the material is to study the examples first. Then review the figures that capture key rules or algorithms. Skip over details that bog you down at first, and return to them later once you have additional context. Using and implementing novel programming language concepts will further enhance your understanding. The Web Supplement contains interpreters for various pedagogical languages used in the book, and there are many implementationbased exercises that will help forge connections between theory and practice.
To the Teacher We teach the highlights of the material in this book in 24 lectures over a 14week period. Each lecture is 1.5 hours long, and students also attend a one-hour recitation every week. With this amount of contact time it is not possible to cover all of the detail in the book. The Web Supplement contains an example lecture schedule, reading assignments, and problem sets. In addition, the MIT OpenCourseWare site at ocw.mit.edu contains material from previous versions of 6.821. This book can be used to teach many different kinds of courses, including an introduction to semantics (Chapters 1–5), essential concepts of programming languages (Chapters 1–13), and types and effects (Chapters 6 and 11–16). We hope you enjoy teaching this material as much as we have!
Acknowledgments This book owes its existence to many people. We are grateful to the following individuals for their contributions: • Jonathan Rees profoundly influenced the content of this book when he was a teaching assistant. Many of the mini-languages, examples, exercises, and software implementations, as well as some of the sections of text, had their origins with Jonathan. Jonathan was also the author of an early data type and pattern matching facility used in course software that strongly influenced the facilities described in the book. • Brian Reistad and Trevor Jim greatly improved the quality of the book. As teaching assistants, they unearthed and fixed innumerable bugs, improved the presentation and content of the material, and created many new exercises. Brian also played a major role in implementing software for testing the minilanguages in the book. • In addition to his contributions as a teaching assistant, Alex Salcianu also collected and edited homework and exam problems from fifteen years of the course for inclusion in the book. • Valuable contributions and improvements to this book were made by other teaching assistants: Aaron Adler, Alexandra Andersson, Arnab Bhattacharyya, Michael (Ziggy) Blair, Barbara Cutler, Timothy Danford, Joshua Glazer, Robert Grimm, Alex Hartemink, David Huynh, Adam Kiezun, Eddie Kohler, Gary Leavens, Ravi Nanavati, Jim O’Toole, Dennis Quan, Alex Snoeren, Patrick Sobalvarro, Peter Szilagyi, Bienvenido Velez-Rivera, Earl Waldin, and Qian Wang. • In Fall 2002 and Fall 2004, Michael Ernst taught 6.821 based on an earlier version of this book, and his detailed comments resulted in many improvements. • Based on teaching 6.821 at MIT and using the course materials at Hong Kong University and Georgia Tech, Olin Shivers made many excellent suggestions on how to improve the content and presentation of the material. • While using the course materials at other universities, Gary Leavens, Andrew Myers, Randy Osborne, and Kathy Yelick provided helpful feedback.
xxii
Acknowledgments
• Early versions of the pragmatics system were written by Doug Grundman, with major extensions by Raymie Stata and Brian Reistad. • Pierre Jouvelot did the lion’s share of the implementation of FX (a language upon which early versions of 6.821 were based) with help from Mark Sheldon and Jim O’Toole. • David Espinosa introduced us to embedded interpreters and helped us to improve our presentation of dynamic semantics, effects, and compilation. • Guillermo Rozas taught us many nifty pragmatics tricks. Our pragmatics coverage is heavily influenced by his source-to-source front end to the MIT Scheme compiler. • Ken Moody provided helpful feedback on the course material, especially on the PostFix Equivalence Theorem. • Numerous students have improved this book in various ways, from correcting bugs to suggesting major reorganizations. In this regard, we are especially grateful to: Atul Adya, Kavita Bala, Ron Bodkin, Philip Bogle, Miguel Castro, Anna Chefter, Natalya Cohen, Brooke Cowan, Richard Davis, Andre deHon, Michael Frank, Robert Grimm, Yevgeny Gurevich, Viktor Kuncak, Mark Lillibridge, Greg Little, Andrew Myers, Michael Noakes, Heidi Pan, John Pezaris, Matt Power, Roberto Segala, Emily Shen, Mark Torrance, Michael Walfish, Amy Williams, and Carl Witty. • Tim Chevalier and Jue Wang uncovered numerous typos and inconsistencies in their careful proofreading of book drafts. • Special thanks to Jeanne Darling, who has been the 6.821 course administrator for over ten years. Her administrative, editing, and technical skills, as well as her can-do spirit and cheerful demeanor, were critical in keeping both the course and the book project afloat. • We bow before David Jones, whose TEX wizardry is so magical we are sure he has a wand hidden in his sleeve. • Kudos go to Julie Sussman, PPA, for her excellent work as a technical editor on the book. Julie’s amazing ability to find and fix uncountably many technical bugs, inconsistencies, ambiguities, and poor explanations in every chapter we thought was “done” has improved the quality of the book tremendously. Of course, Julie cannot be held responsible for remaining erorrs, especially them what we introducd after she fixished the editos.
xxiii • We are grateful to the MIT Press for their patience with us over the years we worked on this book. We also have some personal dedications and acknowledgments: Franklyn: I dedicate this book to my parents, Dr. Albin F. Turbak and Irene J. Turbak, who taught me (1) how to think and (2) never to give up, traits without which this book would not exist. I owe my love of programming languages to Hal Abelson and Jerry Sussman, whose Structure and Interpretation of Computer Programs book and class changed the course my life, and to Dave Gifford, whose 6.821 class inspired an odyssey of programming language exploration that is still ongoing. My understanding of programming languages matured greatly through my interactions with members of the Church Project, especially Assaf Kfoury, Torben Amtoft, Anindya Banerjee, Alan Bawden, Chiyan Chen, Allyn Dimock, Glenn Holloway, Trevor Jim, Elena Machkasova, Harry Mairson, Bob Muller, Peter Møller Neergaard, Santiago Pericas, Joe Wells, Ian Westmacott, Hongwei Xi, and Dengping Zhu. I am grateful to Wellesley College for providing me with a sabbatical during the 2005-06 academic year, which I devoted largely to work on this book. Finally, I thank my wife, Lisa, and daughters, Ohana and Kalani, who have never known my life without “the book” but have been waiting oh-so-long to find out what it will be like. Their love keeps me going! Dave: Heidi, Ariella, and Talia — thanks for your support and love; this book is dedicated to you. To my parents, for providing me with opportunities that enabled my successes. Thanks Franklyn, for your labors on this book, and the chance to share your passion for programming languages. Thanks Julie. You are a beacon of quality. Thanks Mark, for all your help on this project. And finally, thanks to all of the 6.821 students. Your enthusiasm, intelligence, and questions provided the wonderful context that motivated this book and made it fun.
xxiv
Acknowledgments
Mark: I am grateful to my coauthors for bringing me into this project. The task was initially to be a few weeks of technical editing but blossomed into a rewarding and educational five-year coauthoring journey. I thank my colleagues and students at Wellesley. My students were patient beyond all reason when told their work hadn’t been graded because I was working on “the book.” I am fortunate to have the love and support of my family: my wife, Ishrat Chaudhuri, my daughters, Raina and Maya, and my parents, Beverly Sheldon and Frank Sheldon. I would also like to thank my dance partner, Mercedes von Deck, my coaches (especially Stephen and Jennifer Hillier and Charlotte Jorgensen), and my dance students.
Part I
Foundations
1 Introduction Order and simplification are the first steps toward the mastery of a subject — the actual enemy is the unknown. — Thomas Mann, The Magic Mountain
1.1
Programming Languages
Programming is a lot of fun. As you have no doubt experienced, clarity and simplicity are the keys to good programming. When you have a tangle of code that is difficult to understand, your confidence in its behavior wavers, and the code is no longer any fun to read or update. Designing a new programming language is a kind of metalevel programming activity that is just as much fun as programming in a regular language (if not more so). You will discover that clarity and simplicity are even more important in language design than they are in ordinary programming. Today hundreds of programming languages are in use — whether they be scripting languages for Internet commerce, user interface programming tools, spreadsheet macros, or page format specification languages that when executed can produce formatted documents. Inspired application design often requires a programmer to provide a new programming language or to extend an existing one. This is because flexible and extensible applications need to provide some sort of programming capability to their end users. Elements of programming language design are even found in “ordinary” programming. For instance, consider designing the interface to a collection data structure. What is a good way to encapsulate an iteration idiom over the elements of such a collection? The issues faced in this problem are similar to those in adding a looping construct to a programming language. The goal of this book is to teach you the great ideas in programming languages in a simple framework that strips them of complexity. You will learn several ways to specify the meaning of programming language constructs and will see that small changes in these specifications can have dramatic consequences for program behavior. You will explore many dimensions of the programming
4
Chapter 1 Introduction
language design space, study decisions to be made along each dimension, and consider how decisions from different dimensions can interact. We will teach you about a wide variety of neat tricks for extending programing languages with interesting features like undoable state changes, exitable loops, and pattern matching. Our approach for teaching you this material is based on the premise that when language behaviors become incredibly complex, the descriptions of the behaviors must be incredibly simple. It is the only hope.
1.2
Syntax, Semantics, and Pragmatics
Programming languages are traditionally viewed in terms of three facets: 1. Syntax — the form of programming languages. 2. Semantics — the meaning of programming languages. 3. Pragmatics — the implementation of programming languages. Here we briefly describe these facets.
Syntax Syntax focuses on the concrete notations used to encode programming language phrases. Consider a phrase that indicates the sum of the product of v and w and the quotient of y and z. Such a phrase can be written in many different notations — as a traditional mathematical expression: vw + y/z
or as a Lisp parenthesized prefix expression: (+ (* v w) (/ y z))
or as a sequence of keystrokes on a postfix calculator: v
enter
w
enter
×
y
enter
z
enter
or as a layout of cells and formulas in a spreadsheet: A B C D
1 v= w= y= z=
2
3 v*w = y/z = ans =
4 A2 * B2 C2 / D2 A4 + B4
÷
+
1.2 Syntax, Semantics, and Pragmatics
5
or as a graphical tree: + * v
/ y
w
z
Although these concrete notations are superficially different, they all designate the same abstract phrase structure (the sum of a product and a quotient). The syntax of a programming language specifies which concrete notations (strings of characters, lines on a page) in the language are legal and which tree-shaped abstract phrase structure is denoted by each legal notation.
Semantics Semantics specifies the mapping between the structure of a programming language phrase and what the phrase means. Such phrases have no inherent meaning: their meaning is determined only in the context of a system for interpreting their structure. For example, consider the following expression tree: * + 1
10 11
Suppose we interpret the nodes labeled 1, 10, and 11 as the usual decimal notation for numbers, and the nodes labeled + and * as the sum and product of the values of their subnodes. Then the root of the tree stands for (1 + 11) · 10 = 120. But there are many other possible meanings for this tree. If * stands for exponentiation rather than multiplication, the meaning of the tree could be 1210 . If the numerals are in binary notation rather than decimal notation, the tree could stand for (in decimal notation) (1 + 3) · 2 = 8. Alternatively, suppose that odd integers stand for the truth value true, even integers stand for the truth value false, and + and * stand for, respectively, the logical disjunction (∨) and conjunction (∧) operators on truth values; then the meaning of the tree is false. Perhaps the tree does not indicate an evaluation at all, and only stands for a property intrinsic to the tree, such as its height (3), its number of nodes (5), or its shape (perhaps it describes a simple corporate hierarchy). Or maybe the tree is an arbitrary encoding for a particular object of interest, such as a person or a book.
6
Chapter 1 Introduction
This example illustrates how a single program phrase can have many possible meanings. Semantics describes the relationship between the abstract structure of a phrase and its meaning.
Pragmatics Whereas semantics deals with what a phrase means, pragmatics focuses on the details of how that meaning is computed. Of particular interest is the effective use of various resources, such as time, space, and access to shared physical devices (storage devices, network connections, video monitors, printers, speakers, etc.). As a simple example of pragmatics, consider the evaluation of the following expression tree (under the first semantic interpretation described above): / -
+
a
a
*
+ b
2
b
3
Suppose that a and b stand for particular numeric values. Because the phrase (+ a b) appears twice, a naive evaluation strategy will compute the same sum twice. An alternative strategy is to compute the sum once, save the result, and use the saved result the next time the phrase is encountered. The alternative strategy does not change the meaning of the program, but does change its use of resources; it reduces the number of additions performed, but may require extra storage for the saved result. Is the alternative strategy better? The answer depends on the details of the evaluation model and the relative importance of time and space. Another potential improvement in the example involves the phrase (* 2 3), which always stands for the number 6. If the sample expression is to be evaluated many times (for different values of a and b), it may be worthwhile to replace (* 2 3) by 6 to avoid unnecessary multiplications. Again, this is a purely pragmatic concern that does not change the meaning of the expression.
1.3
Goals
The goals of this book are to explore the semantics of a comprehensive set of programming language design idioms, show how they can be combined into complete
1.3 Goals
7
practical programming languages, and discuss the interplay between semantics and pragmatics. Because syntactic issues are so well covered in standard compiler texts, we won’t say much about syntax except for establishing a few syntactic conventions at the outset. We will introduce a number of tools for describing the semantics of programming languages, and will use these tools to build intuitions about programming language features and study many of the dimensions along which languages can vary. Our coverage of pragmatics is mainly at a high level. We will study some simple programming language implementation techniques and program improvement strategies rather than focus on squeezing the last ounce of performance out of a particular computer architecture. We will discuss programming language features in the context of several minilanguages. Each of these is a simple programming language that captures the essential features of a class of existing programming languages. In many cases, the mini-languages are so pared down that they are hardly suitable for serious programming activities. Nevertheless, these languages embody all of the key ideas in programming languages. Their simplicity saves us from getting bogged down in needless complexity in our explorations of semantics and pragmatics. And like good modular building blocks, the components of the mini-languages are designed to be “snapped together” to create practical languages. Issues of semantics and pragmatics are important for reasoning about properties of programming languages and about particular programs in these languages. We will also discuss them in the context of two fundamental strategies for programming language implementation: interpretation and translation. In the interpretation approach, a program written in a source language S is directly executed by an S-interpreter, which is a program written in an implementation language. In the translation approach, an S program is translated to a program in the target language T , which can be executed by a T -interpreter. The translation itself is performed by a translator program written in an implementation language. A translator is also called a compiler, especially when it translates from a high-level language to a low-level one. We will use minilanguages for our source and target languages. For our implementation language, we will use the mathematical metalanguage described in Appendix A. However, we strongly encourage readers to build working interpreters and translators for the mini-languages in their favorite real-world programming languages. Metaprogramming — writing programs that manipulate other programs — is perhaps the most exciting form of programming!
8
Chapter 1 Introduction
1.4
PostFix: A Simple Stack Language
We will introduce the tools for syntax, semantics, and pragmatics in the context of a mini-language called PostFix. PostFix is a simple stack-based language inspired by the PostScript graphics language, the Forth programming language, and Hewlett Packard calculators. Here we give an informal introduction to PostFix in order to build some intuitions about the language. In subsequent chapters, we will introduce tools that allow us to study PostFix in more depth.
1.4.1
Syntax
The basic syntactic unit of a PostFix program is the command. Commands are of the following form: • Any integer numeral. E.g., 17, 0, -3. • One of the following special command tokens: add, div, eq, exec, gt, lt, mul, nget, pop, rem, sel, sub, swap. • An executable sequence — a single command that serves as a subroutine. It is written as a parenthesized list of subcommands separated by whitespace (any contiguous sequence of characters that leave no mark on the page, such as spaces, tabs, and newlines). E.g., (7 add 3 swap) or (2 (5 mul) exec add). Since executable sequences contain other commands (including other executable sequences), they can be arbitrarily nested. An executable sequence counts as a single command despite its hierarchical structure. A PostFix program is a parenthesized sequence consisting of (1) the token postfix followed by (2) a natural number (i.e., nonnegative integer) indicating the number of program parameters followed by (3) zero or more PostFix commands. Here are some sample PostFix programs: (postfix 0 4 7 sub) (postfix 2 add 2 div) (postfix 4 4 nget 5 nget mul mul swap 4 nget mul add add) (postfix 1 ((3 nget swap exec) (2 mul swap exec) swap) (5 sub) swap exec exec)
In PostFix, as in all the languages we’ll be studying, all parentheses are required and none are optional. Moving parentheses around changes the structure of the program and most likely changes its behavior. Thus, while the following
1.4.2 Semantics
9
PostFix executable sequences use the same numerals and command tokens in the same order, they are distinguished by their parenthesization, which, as we shall see below, makes them behave differently. ((1) (2 3 4) swap exec) ((1 2) (3 4) swap exec) ((1 2) (3 4 swap) exec)
1.4.2
Semantics
The meaning of a PostFix program is determined by executing its commands in left-to-right order. Each command manipulates an implicit stack of values that initially contains the integer arguments of the program (where the first argument is at the top of the stack and the last argument is at the bottom). A value on the stack is either (1) an integer numeral or (2) an executable sequence. The result of a program is the integer value at the top of the stack after its command sequence has been completely executed. A program signals an error if (1) the final stack is empty, (2) the value at the top of the final stack is not an integer, or (3) an inappropriate stack of values is encountered when one of its commands is executed. The behavior of PostFix commands is summarized in Figure 1.1. Each command is specified in terms of how it manipulates the implicit stack. We use args the notation P − −− → v to mean that executing the PostFix program P on the args integer argument sequence args returns the value v . The notation P − −− → error means that executing the PostFix program P on the arguments args signals an error. Errors are caused by inappropriate stack values or an insufficient number of stack values. In practice, it is desirable for an implementation to indicate the type of error. We will use comments (delimited by braces) to explain errors and other situations. To illustrate the meanings of various commands, we show the results of some simple program executions. For example, numerals are pushed onto the stack, while pop and swap are the usual stack operations. (postfix (postfix (postfix (postfix (postfix
0 0 0 0 0
1 1 1 1 1
[]
2 3) − − → 3 {Only the top stack value is returned.} [] − →2 2 3 pop) − [] − →1 2 swap 3 pop) − [] − → error {Not enough values to swap.} swap) − [] − → error {Empty stack on second pop.} pop pop) −
Program arguments are pushed onto the stack (from last to first) before the execution of the program commands.
10
Chapter 1 Introduction
N : Push the numeral N onto the stack. sub: Call the top stack value v1 and the next-to-top stack value v2 . Pop these two values off the stack and push the result of v2 − v1 onto the stack. If there are fewer than two values on the stack or the top two values aren’t both numerals, signal an error. The other binary arithmetic operators — add (addition), mul (multiplication), div (integer divisiona ), and rem (remainder of integer division) — behave similarly. Both div and rem signal an error if v1 is zero. lt: Call the top stack value v1 and the next-to-top stack value v2 . Pop these two values off the stack. If v2 < v1 , then push a 1 (a true value) on the stack, otherwise push a 0 (false). The other binary comparison operators — eq (equals) and gt (greater than) — behave similarly. If there are fewer than two values on the stack or the top two values aren’t both numerals, signal an error. pop: Pop the top element off the stack and discard it. Signal an error if the stack is empty. swap: Swap the top two elements of the stack. Signal an error if the stack has fewer than two values. sel: Call the top three stack values (from top down) v1 , v2 , and v3 . Pop these three values off the stack. If v3 is the numeral 0, push v1 onto the stack; if v3 is a nonzero numeral, push v2 onto the stack. Signal an error if the stack does not contain three values, or if v3 is not a numeral. nget: Call the top stack value vindex and the remaining stack values (from top down) v1 , v2 , . . ., vn . Pop vindex off the stack. If vindex is a numeral i such that 1 ≤ i ≤ n and vi is a numeral, push vi onto the stack. Signal an error if the stack does not contain at least one value, if vindex is not a numeral, if i is not in the range [1..n], or if vi is not a numeral. (C1 . . . Cn ): Push the executable sequence (C1 . . . Cn ) as a single value onto the stack. Executable sequences are used in conjunction with exec. exec: Pop the executable sequence from the top of the stack, and prepend its component commands onto the sequence of currently executing commands. Signal an error if the stack is empty or the top stack value isn’t an executable sequence. a The integer division of n and d returns the integer quotient q such that n = qd + r, where r (the remainder) is such that 0 ≤ r < |d| if n ≥ 0 and −|d| < r ≤ 0 if n < 0.
Figure 1.1
English semantics of PostFix commands. [3,4]
(postfix 2) −−− → 3 {Initial stack has 3 on top with 4 below.} [3,4] →4 (postfix 2 swap) −−− [3,4,5] →5 (postfix 3 pop swap) −−−−
1.4.2 Semantics
11
It is an error if the actual number of arguments does not match the number of parameters specified in the program. [3]
(postfix 2 swap) −− → error {Wrong number of arguments.} [4,5] → error {Wrong number of arguments.} (postfix 1 pop) −−−
Note that program arguments must be integers — they cannot be executable sequences. Numerical operations are expressed in postfix notation, in which each operator comes after the commands that compute its operands. add, sub, mul, and div are binary integer operators. lt, eq, and gt are binary integer predicates returning either 1 (true) or 0 (false). (postfix (postfix (postfix (postfix (postfix (postfix (postfix (postfix (postfix (postfix (postfix (postfix
1 1 5 3 2 1 1 1 1 1 1 2
[3]
4 sub) −− → -1 [3] →4 4 add 5 mul 6 sub 7 div) −− [7,6,5,4,3] add mul sub swap div) −−−−−−→ -20 [300,20,1] 4000 swap pop add) −−−−−−→ 4020 [3,7] → 5 {An averaging program.} add 2 div) −−− [17] −− →5 3 div) − [17] −− →2 3 rem) − [3] →1 4 lt) −− [5] →0 4 lt) −− [3] → 11 4 lt 10 add) −− [3] → error {Not enough numbers to add.} 4 mul add) −− [4,5] → error {Divide by zero.} 4 sub div) −−−
In all the above examples, each stack value is used at most once. Sometimes it is desirable to use a number two or more times or to access a number that is not near the top of the stack. The nget command is useful in these situations; it puts at the top of the stack a copy of a number located on the stack at a specified index. The index is 1-based, from the top of the stack down, not counting the index value itself. [4,5]
(postfix 2 1 nget) −−− → 4 {4 is at index 1, 5 at index 2.} [4,5] →5 (postfix 2 2 nget) −−−
It is an error to use an index that is out of bounds or to access a nonnumeric stack value (i.e., an executable sequence) with nget. (postfix 2 3 nget) (postfix 2 0 nget) (postfix 1 (2 mul) {Value at index 1 is
[4,5]
−−− → error {Index 3 is too large.} [4,5] → error {Index 0 is too small.} −−− [3] → error 1 nget) −− not a number but an executable sequence.}
12
Chapter 1 Introduction
The nget command is particularly useful for numerical programs, where it is common to reference arbitrary parameter values and use them multiple times. [5]
(postfix 1 1 nget mul) −− → 25 {A squaring program.} [3,4,5,2] (postfix 4 4 nget 5 nget mul mul swap 4 nget mul add add) −−−−−→ 25 2 {Given a, b, c, x, calculates ax + bx + c.}
As illustrated in the last example, the index of a given value increases every time a new value is pushed onto the stack. The final stack in this example contains (from top down) 25 and 2, showing that the program may end with more than one value on the stack. Executable sequences are compound commands like (2 mul) that are pushed onto the stack as a single value. They can be executed later by the exec command. Executable sequences act like subroutines in other languages; execution of an executable sequence is similar to a subroutine call, except that transmission of arguments and results is accomplished via the stack. [7]
(postfix 1 (2 mul) exec) −− → 14 {(2 mul) is a doubling subroutine.} [] − → -7 (postfix 0 (0 swap sub) 7 swap exec) − {(0 swap sub) is a negation subroutine.} [] − → error {Final top of stack is not an integer.} (postfix 0 (2 mul)) − [] − → error (postfix 0 3 (2 mul) gt) − {Executable sequence where number expected.} [] − → error {Number where executable sequence expected.} (postfix 0 3 exec) − [] − → -7 (postfix 0 (7 swap exec) (0 swap sub) swap exec) − (postfix 2 (mul sub) (1 nget mul) 4 nget swap exec swap exec) [−10,2] −−−− → 42 {Given a and b, calculates b − a·b2 .} −
The last two examples illustrate that evaluations involving executable sequences can be rather contorted. The sel command selects between two values based on a test value, where zero is treated as false and any nonzero integer is treated as true. It can be used in conjunction with exec to conditionally execute one of two executable sequences. [1]
(postfix 1 2 3 sel) −− →2 [0] →3 (postfix 1 2 3 sel) −− [17] −− → 2 {Any nonzero number is “true.”} (postfix 1 2 3 sel) − [] − → error {Test not a number.} (postfix 0 (2 mul) 3 4 sel) − [3,4,5,6] (postfix 4 lt (add) (mul) sel exec) −−−−−→ 30 [4,3,5,6] (postfix 4 lt (add) (mul) sel exec) −−−−−→ 11 [−7] →7 (postfix 1 1 nget 0 lt (0 swap sub) () sel exec) −−− {An absolute value program.} [6] →6 (postfix 1 1 nget 0 lt (0 swap sub) () sel exec) −−
1.4.2 Semantics
13
Exercise 1.1 Determine the value of the following PostFix programs on an empty stack. a. (postfix 0 10 (swap 2 mul sub) 1 swap exec) b. (postfix 0 (5 (2 mul) exec) 3 swap) c. (postfix 0 (() exec) exec) d. (postfix 0 2 3 1 add mul sel) e. (postfix 0 2 3 1 (add) (mul) sel) f. (postfix 0 2 3 1 (add) (mul) sel exec) g. (postfix 0 0 (2 3 add) 4 sel exec) h. (postfix 0 1 (2 3 add) 4 sel exec) i. (postfix 0 (5 6 lt) (2 3 add) 4 sel exec) j. (postfix 0 (swap exec swap exec) (1 sub) swap (2 mul) swap 3 swap exec)
Exercise 1.2 a. What function of its argument does the following PostFix program calculate? (postfix 1 ((3 nget swap exec) (2 mul swap exec) swap) (5 sub) swap exec exec)
b. Write a simpler PostFix program that performs the same calculation. Exercise 1.3 Recall that executable sequences are effectively subroutines that, when invoked (by the exec command), take their arguments from the top of the stack. Write executable sequences that compute the following logical operations. Recall that 0 stands for false and all other numerals are treated as true. a. not: return the logical negation of a single argument. b. and: given two numeric arguments, return 1 if their logical conjunction is true, and 0 otherwise. c. short-circuit-and: return 0 if the first argument is false; otherwise return the second argument. d. Demonstrate the difference between and and short-circuit-and by writing a PostFix program with zero arguments that has a different result if and is replaced by shortcircuit-and. Exercise 1.4 a. Without nget, is it possible to write a PostFix program that squares its single argument? If so, write it; if not, explain.
14
Chapter 1 Introduction
b. Is it possible to write a PostFix program that takes three integers and returns the smallest of the three? If so, write it; if not, explain. c. Is it possible to write a PostFix program that calculates the factorial of its single argument (assume it’s nonnegative)? If so, write it; if not, explain.
1.4.3
The Pitfalls of Informal Descriptions
The “by-example” and English descriptions of PostFix given above are typical of the way that programming languages are described in manuals, textbooks, courses, and conversations. That is, a syntax for the language is presented, and the semantics of each of the language constructs is specified using English prose and examples. The utility of this method for specifying semantics is apparent from the fact that the vast majority of programmers learn to read and write programs via this approach. But there are many situations in which informal descriptions of programming languages are inadequate. Suppose that we want to improve a program by transforming complex phrases into phrases that are simpler and more efficient. How can we be sure that the transformation process preserves the meaning of the program? Or suppose that we want to prove that the language as a whole has a particular property. For instance, it turns out that every PostFix program is guaranteed to terminate (i.e., a PostFix program cannot enter an infinite loop). How would we go about proving this property based on the informal description? Natural language does not provide any rigorous framework for reasoning about programs or programming languages. Without the aid of some formal reasoning tools, we can only give hand-waving arguments that are not likely to be very convincing. Or suppose that we wish to extend PostFix with features that make it easier to use. For example, it would be nice to name values, to collect values into arrays, to query the user for input, and to loop over sequences of values. With each new feature, the specification of the language becomes more complex, and it becomes more difficult to reason about the interaction between various features. We’d like techniques that help to highlight which features are orthogonal and which can interact in subtle ways. Or suppose that a software vendor wants to develop PostFix into a product that runs on several different machines. The vendor wants any given PostFix program to have exactly the same behavior on all of the supported machines. But how do the development teams for the different machines guarantee that they’re all implementing the “same” language? If there are any ambiguities in the PostFix specification that they’re implementing, different development
1.5 Overview of the Book
15
teams might resolve the ambiguity in incompatible ways. What’s needed in this case is an unambiguous specification of the language as well as a means of proving that an implementation meets that specification. The problem with informal descriptions of a programming language is that they’re neither concise nor precise enough for these kinds of situations. English is often verbose, and even relatively simple ideas can be unduly complicated to explain. Moreover, it’s easy for the writer of an informal specification to underspecify a language by forgetting to cover all the special cases (e.g., error situations in PostFix). It isn’t that covering all the special cases is impossible; it’s just that the natural-language framework doesn’t help much in pointing out what the special cases are. It is possible to overspecify a language in English as well. Consider the PostFix programming model introduced above. The current state of a program is captured in two entities: the stack and the current command sequence. To programmers and implementers alike, this might imply that a language implementation must have explicit stack and command sequence elements in it. Although these would indeed appear in a straightforward implementation, they are not in any way required; there are alternative models and implementations for PostFix (e.g., see Exercise 3.12 on page 70). It would be desirable to have a more abstract definition of what constitutes a legal PostFix implementation so that a would-be implementer could be sure that an implementation was faithful to the language definition regardless of the representations and algorithms employed.
1.5
Overview of the Book
The remainder of Part I introduces a number of tools that address the inadequacies outlined above and that form an essential foundation for the study of programming language design. Chapter 2 presents s-expression grammars, a simple specification for syntax that we will use to describe the structure of all of the mini-languages we will explore. Then, using PostFix and a simple expression language as our objects of study, we introduce two approaches to formal semantics: • An operational semantics (Chapter 3) explains the meaning of programming language constructs in terms of the step-by-step process of an abstract machine. • A denotational semantics (Chapter 4) explains the meaning of programming language constructs in terms of the meaning of their subparts.
16
Chapter 1 Introduction
These approaches support the unambiguous specification of programming languages and provide a framework in which to reason about properties of programs and languages. Our discussion of tools concludes in Chapter 5 with a presentation of a technique for determining the meaning of recursive specifications. Throughout the book, and especially in these early chapters, we formalize concepts in terms of a mathematical metalanguage described in Appendix A. Readers are encouraged to familiarize themselves with this language by skimming this appendix early on and later referring to it in more detail on an “as needed” basis. Part II focuses on dynamic semantics, the meaning of programming language constructs and the run-time behavior of programs. In Chapter 6, we introduce FL, a mini-language we use as a basis for investigating dimensions of programming language design. By extending FL in various ways, we then explore programming language features along key dimensions: naming (Chapter 7), state (Chapter 8), control (Chapter 9), and data (Chapter 10). Along the way, we will encounter several programming paradigms, high-level approaches for viewing computation: function-oriented programming, imperative programming, and object-oriented programming. In Part III, we shift our focus to static semantics, properties of programs that can be determined without executing them. In Chapter 11, we introduce the notion of type — a description of what an expression computes — and develop a simple type-checking system for a dialect of FL such that “well-typed” programs cannot encounter certain kinds of run-time errors. In Chapter 12, we study some more advanced features of typed languages: subtyping, universal polymorphism, bounded quantification, and kind systems. A major drawback to many of our typed mini-languages is that programmers are required to annotate programs with significant amounts of explicit type information. In some languages, many of these annotations can be eliminated via type reconstruction, a technique we study in Chapter 13. Types can be used as a mechanism for enforcing data abstraction, a notion that we explore in Chapter 14. In Chapter 15, we show how many of the dynamic and static semantics features we have studied can be combined to yield a mini-language in which program modules with both value and type components can be independently type-checked and then linked together in a type-safe way. We wrap up our discussion of static semantics in Chapter 16 with a study of effect systems, which describe how expressions compute rather than what they compute. The book culminates, in Part IV, in a pragmatics segment that illustrates how concepts from dynamic and static semantics play an important role in the implementation of a programming language. Chapter 17 presents a compiler that translates from a typed dialect of FL to a low-level language that resembles
1.5 Overview of the Book
17
assembly code. The compiler is organized as a sequence of meaning-preserving translation steps that construct explicit representations for the naming, state, control, and data aspects of programs. In order to automatically reclaim memory in a type-safe way, the run-time system for executing the low-level code generated by the compiler uses garbage collection, a topic that is explored in Chapter 18. While we will emphasize formal tools throughout this book, we do not imply that formal tools are a panacea or that formal approaches are superior to informal ones in an absolute sense. In fact, informal explanations of language features are usually the simplest way to learn about a language. In addition, it’s very easy for formal approaches to get out of control, to the point where they are overly obscure, or require too much mathematical machinery to be of any practical use on a day-to-day basis. For this reason, we won’t cover material as a dry sequence of definitions, theorems, and proofs. Instead, our goal is to show that the concepts underlying the formal approaches are indispensable for understanding particular programming languages as well as the dimensions of language design. The tools, techniques, and features introduced in this book should be in any serious computer scientist’s bag of tricks.
2 Syntax since feeling is first who pays any attention to the syntax of things will never wholly kiss you; ... for life’s not a paragraph And death i think is no parenthesis — e. e. cummings, “since feeling is first” In the area of programming languages, syntax refers to the form of programs — how they are constructed from symbolic parts. A number of theoretical and practical tools — including grammars, lexical analyzers, and parsers — have been developed to aid in the study of syntax. By and large we will downplay syntactic issues and tools. Instead, we will emphasize the semantics of programs; we will study the meaning of language constructs rather than their form. We are not claiming that syntactic issues and tools are unimportant in the analysis, design, and implementation of programming languages. In actual programming language implementations, syntactic issues are very important and a number of standard tools (like Lex and Yacc) are available for addressing them. But we do believe that syntax has traditionally garnered much more than its fair share of attention, largely because its problems were more amenable to solution with familiar tools. This state of affairs is reminiscent of the popular tale of the person who searches all night long under a street lamp for a lost item not because the item was lost there but because the light was better. Luckily, many investigators strayed away from the street lamp of parsing theory in order to explore the much dimmer area of semantics. Along the way, they developed many new tools for understanding semantics, some of which we will focus on in later chapters. Despite our emphasis on semantics, however, we can’t ignore syntax completely. Programs must be expressed in some form, preferably one that elucidates the fundamental structure of the program and is easy to read, write, and reason
20
Chapter 2 Syntax
about. In this chapter, we introduce a set of syntactic conventions for describing our mini-languages.
2.1
Abstract Syntax
We will motivate various syntactic issues in the context of EL, a mini-language of expressions. EL expressions have a tree-like structure that is more typical of program phrases than the mostly linear structure of PostFix command sequences. EL describes functions that map any number of numerical inputs to a single numerical output. Such a language might be useful on a calculator, say, for automating the evaluation of commonly used mathematical formulas. Figure 2.1 describes (in English) the abstract structure of a legal EL program. EL programs contain numerical expressions, where a numerical expression can be constructed out of various kinds of components. Some of the components, like numerals, references to input values, and various kinds of operators, are primitive — they cannot be broken down into subparts.1 Other components are compound — they are constructed out of constituent components. The components have names; e.g., the subparts of an arithmetic operation are the rator (short for “operator”) and two rands (short for “operands”), while the subexpressions of the conditional expression are the test expression, the then expression, and the else expression. There are three major classes of phrases in an EL program: whole programs that designate calculations on a given number of inputs, numerical expressions that designate numbers, and boolean expressions that designate truth values (i.e., true or false). The structural description in Figure 2.1 constrains the ways in which these expressions may be “wired together.” For instance, the test component of a conditional must be a boolean expression, while the then and else components must be numerical expressions. A specification of the allowed wiring patterns for the syntactic entities of a language is called a grammar. Figure 2.1 is said to be an abstract grammar because it specifies the logical structure of the syntax but does not give any indication how individual expressions in the language are actually written. Parsing a program phrase with an abstract grammar results in a value called an abstract syntax tree (AST). As we will see in Section 2.3, abstract syntax trees are easy to inspect and disassemble, making them ideal substrates for defining the meaning of program phrases in terms of their parts. Consider an EL program that returns zero if its first input is between 1 and 10 (exclusive) and otherwise returns the product of the second and third inputs. 1
Numerals can be broken down into digits, but we will ignore this detail.
2.1 Abstract Syntax
21
A legal EL program is a pair of (1) a numargs numeral specifying the number of parameters and (2) a body that is a numerical expression, where a numerical expression is one of: • an intval — an integer literal num; • an input — a reference to one of the program inputs specified by an index numeral; • an arithmetic operation — an application of a rator, in this case a binary arithmetic operator, to two numerical rand expressions, where an arithmetic operator is one of: • addition, • subtraction, • multiplication, • division, • remainder; • a conditional — a choice between numerical then and else expressions determined by a boolean test expression, where a boolean expression is one of: • a boolval — a boolean literal bool; • a relational operation — an application of rator, in this case a binary relational operator, to two numerical rand expressions, where a relational operator is one of: • less-than, • equal-to, • greater-than; • a logical operation — an application of a rator, in this case a binary logical operator, to two boolean rand expressions, where a logical operator is one of: • and, • or. Figure 2.1
An abstract grammar for EL programs.
The abstract syntax tree for this program appears in Figure 2.2. Each node of the tree except the root corresponds to a numerical or boolean expression. The leaves of the tree stand for primitive phrases, while the intermediate nodes represent compound phrases. The labeled edges from a parent node to its children show the relationship between a compound phrase and its components. The AST is defined purely in terms of these relationships; the particular way that the nodes and edges of a tree are arranged on the page is immaterial.
22
Chapter 2 Syntax Program
body
numargs 3
Conditional
then
test Logical Operation
rator
rator
rand1
rand1 Input
>
1
Figure 2.2
2.2
Relational Operation
IntVal
rator
1 && $1 < 10 then 0 else $2 * $3 endif (cond ((and (> (arg 1) 1) (< (arg 1) 10)) 0) (else (* (arg 2) (arg 3)))) 1 input 1 gt 1 input 10 lt and {0} {2 input 3 input mul} choose
The above forms differ along a variety of dimensions: • Keywords and operation names. The keywords if, cond, and choose all indicate a conditional expression, while multiplication is represented by the names 2 It is also possible to represent programs more pictorially, and visual programming languages are an active area of research. But textual representations enjoy certain advantages over visual ones: they tend to be more compact than visual representations; the technology for processing them and communicating them is well established; and, most important, they can effectively make use of our familiarity with natural language.
2.3 S-Expression Grammars Specify ASTs
23
* and mul. Accessing the ith input to the program is written in three different ways: $i, (arg i), and i input. • Operand order. The example forms use infix, prefix, and postfix operations, respectively. • Means of grouping. Grouping can be determined by precedence (&& has a lower precedence than > and < in the first example), keywords (then, else, and endif delimit the test, then, and else parts of the first conditional), or explicit matched delimiter pairs (such as the parentheses and braces in the last two examples). These are only some of the possible dimensions. Many more are imaginable. For instance, numbers could be written in many different numeral formats such as decimal, binary, or octal numerals, scientific notation, or even Roman numerals!
2.3
S-Expression Grammars Specify ASTs
The examples in Section 2.2 illustrate that the nature of concrete syntax necessitates making representational choices that are arbitrary with respect to the abstract syntactic structure. While we will dispense with many of the complexities of concrete syntax, we still need some concrete notation for representing abstract syntax trees. Such a representation should be simple, yet permit us to precisely describe abstract syntax trees and operations on such trees. Throughout this book, we need to operate on abstract syntax trees to determine the meaning of a phrase, the type of a phrase, the translation of a phrase, and so on. To perform such operations, we need a far more compact representation for abstract syntax trees than the English description in Figure 2.1 or the graphical one in Figure 2.2. We have chosen to represent abstract syntax trees using s-expression grammars. An s-expression grammar unites Lisp’s fully parenthesized prefix notation with traditional grammar notations to describe the structure of abstract syntax trees via parenthesized sequences of symbols and metavariables. Not only are these grammars very flexible for defining unambiguous program language syntax, but it is easy to construct programs that process s-expression notation. This facilitates writing interpreters and translators for the mini-languages we will study.
2.3.1
S-Expressions
An s-expression (short for symbolic expression) is a notation for representing trees by parenthesized linear text strings. The leaves of the trees are sym-
24
Chapter 2 Syntax
an this
is example
Figure 2.3
s-expression
tree
Viewing ((this is) an ((example) (s-expression tree))) as a tree.
bolic tokens, where (to first approximation) a symbolic token is any sequence of characters that does not contain a left parenthesis (‘(’), a right parenthesis (‘)’), or a whitespace character. Examples of symbolic tokens include x, foo, this-is-a-token, 17, 6.821, and 4/3*pi*r^2. We always write s-expressions in teletype font. An intermediate node in a tree is represented by a pair of parentheses surrounding the s-expressions that represent the subtrees. Thus, the s-expression ((this is) an ((example) (s-expression tree)))
designates the structure depicted in Figure 2.3. Whitespace is necessary for separating tokens that appear next to each other, but can be used liberally to enhance the readability of the structure. Thus, the above s-expression could also be written as ((this is) an ((example) (s-expression tree)))
without changing the structure of the tree.
2.3.2
The Structure of S-Expression Grammars
An s-expression grammar combines the domain notation of Appendix A with s-expressions to specify the syntactic structure of a language. It has two parts: 1. A list of syntactic domains, one for each kind of phrase. 2. A set of production rules that define the structure of compound phrases. Figure 2.4 presents a sample s-expression grammar for EL.
2.3.2 The Structure of S-Expression Grammars
25
Syntactic Domains P ∈ Prog NE ∈ NumExp BE ∈ BoolExp N ∈ IntLit = {. . . , -2, -1, 0, 1, 2, . . .} B ∈ BoolLit = {true, false} A ∈ ArithmeticOperator = {+, -, *, /, %} R ∈ RelationalOperator = {} L ∈ LogicalOperator = {and, or} Production Rules P ::= (el Nnumargs NE body ) [Program] NE ::= | | |
Nnum (arg Nindex ) (Arator NE rand1 NE rand2 ) (if BE test NE then NE else )
[IntVal] [Input] [ArithmeticOperation] [Conditional]
[BoolVal] BE ::= Bbool | (Rrator NE rand1 NE rand2 ) [RelationalOperation] | (Lrator BE rand1 BE rand2 ) [LogicalOperation] Figure 2.4
An s-expression grammar for EL.
A syntactic domain is a collection of program phrases. Primitive syntactic domains are collections of phrases with no substructure. The primitive syntactic domains of EL are IntLit, BoolLit, ArithmeticOperator, RelationalOperator, and LogicalOperator. Primitive syntactic domains are specified by an enumeration of their elements or by an informal description with examples. For instance, the details of what constitutes a numeral in EL are left to the reader’s intuition. Compound syntactic domains are collections of phrases built out of other phrases. Because compound syntactic domains are defined by a grammar’s production rules, the list of syntactic domains does not explicitly indicate their structure. All syntactic domains are annotated with domain variables (such as NE , BE , and N ) that range over their elements; these play an important role in the production rules. The production rules specify the structure of compound domains. There is one rule for each compound domain. A production rule has the form domain-variable ::= pattern [phrase-type] | pattern [phrase-type] ... | pattern [phrase-type]
26
Chapter 2 Syntax
where • domain-variable is the domain variable for the compound syntactic domain being defined, • pattern is an s-expression pattern (defined below), and • phrase-type is a mnemonic name for the subclass of phrases in the domain that match the pattern. The phrase types correspond to the labels of intermediate nodes in an AST. Each line of the rule is called a production; it specifies a collection of phrases that are considered to belong to the compound syntactic domain being defined. The second production rule in Figure 2.4, for instance, has four productions, specifying that a NumExp can be an integer literal, an indexed input, an arithmetic operation, or a conditional. S-expression grammars are specialized versions of context-free grammars, the standard way to define programming language syntax. Domain variables play the role of nonterminals in such grammars. Our grammars are context-free because each production specifies the expansion of a single nonterminal in a way that does not depend on the context in which that nonterminal appears. The terminals of an s-expression grammar are tokens written in teletype font, such as parentheses, keywords, and literals. For certain elementary domains, we gloss over the details of how their elements are constructed from more basic parts, and instead provide a set-based description. For example, we use the description {. . . , -2, -1, 0, 1, 2, . . .} to define integer literals rather than using productions to specify how they can be constructed from digits and an optional minus sign. An s-expression pattern appearing in a production stands for all s-expressions that have the form of the pattern. An s-expression pattern may include symbolic tokens (such as el, arg, if) to differentiate it from other kinds of s-expression patterns. Domain variables may appear as tokens in s-expression patterns. For example, the pattern (if BE test NE then NE else ) contains a symbolic token (if) and the domain variables BE test , NE then , and NE else . Such a pattern specifies the structure of a compound phrase — a phrase that is built from other phrases. Subscripts on the domain variables indicate their role in the phrase. This helps to distinguish positions within a phrase that have the same domain variable — e.g., the then and else parts of a conditional, which are both numerical expressions. This subscript appears as an edge label in the AST node corresponding to the pattern, while the phrase type of the production appears as the node label. So the if pattern denotes an AST node pattern of the form:
2.3.2 The Structure of S-Expression Grammars
27
Conditional test BE
then NE
else NE
An s-expression pattern PT is said to match an s-expression SX if PT ’s domain variables d1 , . . ., dn can be replaced by matching s-expressions SX 1 , . . ., SX n to yield SX . Each SX i must be an element of the domain over which di ranges. A compound syntactic domain contains exactly those s-expressions that match the patterns of its productions in an s-expression grammar. For example, Figure 2.5 shows the steps by which the NumExp production (if BE test NE then NE else )
matches the s-expression (if (= (arg 1) 3) (arg 2) 4)
Matching is a recursive process: BE test matches (= (arg 1) 3), NE then matches (arg 2), and NE else matches 4. The recursion bottoms out at primitive syntactic domain elements (in this case, elements of the domain IntLit). Figure 2.5 shows how an AST for the sample if expression is constructed as the recursive matching process backs out of the recursion. Note that the pattern (if BE test NE then NE else ) would not match any of the s-expressions (if 1 2 3), (if (arg 2) 2 3), or (if (+ (arg 1) 1) 2 3), because none of the test expressions 1, (arg 2), or (+ (arg 1) 1) match any of the patterns in the productions for BoolExp. More formally, the rules for matching s-expression patterns to s-expressions are as follows: • A pattern (PT 1 . . . PT n ) matches an s-expression (SX 1 . . . SX n ) if each subpattern PT i matches the corresponding subexpression SX i . • A symbolic token T as a pattern matches only itself. • A domain variable for a primitive syntactic domain D matches an s-expression SX if SX is an element of D. • A domain variable for a compound syntactic domain D matches an s-expression SX if one of the patterns in the rule for D matches SX . If SX is an s-expression, we shall use the notation SX D to designate the domain element in D that SX designates. When D is a compound domain, SX D
28
Chapter 2 Syntax s-expression (arg 1)
domain production NE (arg Nindex )
AST Input
index 1
3
NE
Nnum
IntVal
num 3
(= (arg 1) 3)
BE
(Rrator NE rand1 NE rand2 )
Relational Operation rand2
rator rand1 Input
=
IntVal
num
index 1
(arg 2)
NE
(arg Nindex )
3
Input
index 2
4
NE
Nnum
IntVal
num 4
(if (= (arg 1) 3) (arg 2) 4)
NE
(if BE test NE then NE else )
Conditional
test
then
Relational Operation rator rand1 =
Input
index 1
else
Input
rand2 IntVal
index 2
IntVal
num 4
num 3
Figure 2.5 The steps by which (if (= (arg 1) 3) (arg 2) 4) is determined to be a member of the syntactic domain NumExp. In each row, an s-expression matches a domain by a production to yield an abstract syntax tree.
2.3.2 The Structure of S-Expression Grammars
29
P ∈ Prog ::= (el Nnumargs NE body ) [Program] NE ∈ NumExp ::= | | |
Nnum (arg Nindex ) (Arator NE rand1 NE rand2 ) (if BE test NE then NE else )
[IntVal] [Input] [ArithmeticOperation] [Conditional]
[BoolVal] BE ∈ BoolExp ::= Bbool | (Rrator NE rand1 NE rand2 ) [RelationalOperation] | (Lrator BE rand1 BE rand2 ) [LogicalOperation] N ∈ IntLit = {. . . , -2, -1, 0, 1, 2, . . .} B ∈ BoolLit = {true, false} A ∈ ArithmeticOperator = {+, -, *, /, %} R ∈ RelationalOperator = {} L ∈ LogicalOperator = {and, or} Figure 2.6
A more concise rendering of the s-expression grammar for EL.
corresponds to an abstract syntax tree that indicates how SX matches one of the rule patterns for the domain. For example, (if (= (arg 1) 3) (arg 2) 4)NumExp
can be viewed as the abstract syntax tree depicted in Figure 2.5 on page 28. Each node of the AST indicates the production that successfully matches the corresponding s-expression, and each edge indicates a domain variable that appeared in the production pattern. In the notation SX D , domain subscript D serves to disambiguate cases where SX belongs to more than one syntactic domain. For example, 1IntLit is 1 as a primitive numeral, while 1NumExp is 1 as a numerical expression. The subscript will be omitted when the domain is clear from context. Using the s-expression grammar specified in Figure 2.4, the abstract syntax tree in Figure 2.2 can be expressed as: (el 3 (if (and (> (arg 1) 1) (< (arg 1) 10)) 0 (* (arg 2) (arg 3))))
To make s-expression grammars more concise, we will often combine the specification of a compound syntactic domain with its production rules. Figure 2.6 shows the EL s-expression grammar written in this more concise style.
30
Chapter 2 Syntax
Exercise 2.1 a. Write an EL program that takes three integers and returns the largest one. b. Draw an AST for your program.
2.3.3
Phrase Tags
S-expression grammars for our mini-languages will generally follow the Lisp-style convention that compound phrases begin with a phrase tag that unambiguously indicates the phrase type. In EL, if is an example of a phrase tag. The fact that all compound phrases are delimited by explicit parentheses eliminates the need for syntactic keywords in the middle of or at the end of phrases (e.g., then, else, and endif in a conditional). Because phrase tags can be cumbersome, we will often omit them when no ambiguity results. Figure 2.7 shows an alternative syntax for EL in which every production pattern is marked with a distinct phrase tag. In this alternative syntax, the addition of 1 and 2 would be written (arith + (num 1) (num 2)) — quite a bit more verbose than (+ 1 2)! But most of the phrase tags can be removed without introducing ambiguity. Because numerals are clearly distinguished from other s-expressions, there is no need for the num tag. Likewise, we can dispense with the bool tag. Since the arithmetic operators are disjoint from the other operators, the arith tag is superfluous as are the rel and log tags. The result of these optimizations is the original EL syntax in Figure 2.4.
2.3.4
Sequence Patterns
As defined above, each component of an s-expression pattern matches only a single s-expression. But sometimes it is desirable for a pattern component to match a sequence of s-expressions. For example, suppose we want to extend the + operator of EL to accept an arbitrary number of numeric operands, making (+ 1 2 3 4) and (+ 2 (+ 3 4 5) (+ 6 7)) legal numerical expressions in EL. Using the simple patterns introduced above, this extension requires an infinite number of productions: NE ::= | | | | |
... (+) (+ NE rand1 ) (+ NE rand1 NE rand2 ) (+ NE rand1 NE rand2 NE rand3 ) ...
[Addition-0] [Addition-1] [Addition-2] [Addition-3]
2.3.4 Sequence Patterns
31
P ::= (el Nnumargs NE body ) [Program] NE ::= | | |
(num Nnum ) (arg Nindex ) (arith Arator NE rand1 NE rand2 ) (if BE test NE then NE else )
[IntVal] [Input] [ArithmeticOperation] [Conditional]
[BoolVal] BE ::= (bool Bbool ) | (rel Rrator NE rand1 NE rand2 ) [RelationalOperation] | (log Lrator BE rand1 BE rand2 ) [LogicalOperation] Figure 2.7 phrase tag.
An alternative syntax for EL in which every production pattern has a
Here we introduce a concise way of handling this kind of syntactic flexibility within s-expression grammars. We extend s-expression patterns so that any pattern can be annotated with a postfix ∗ character. Such a pattern is called a sequence pattern. A sequence pattern PT ∗ matches any consecutive sequence of zero or more s-expressions SX 1 . . . SX n such that each SX i matches the pattern PT . For instance, the extended addition expression can be specified concisely by the pattern (+ NE ∗rand ). Here are some phrases that match this new pattern, along with the sequence matched by NE ∗rand in each case:3 (+ 6 8 2 1) (+ 7 (+ 5 8 4) (+ 9 6)) (+ 3) (+)
NE ∗rand NE ∗rand NE ∗rand NE ∗rand
= = = =
[6, 8, 2, 1]NumExp [7, (+ 5 8 4), (+ 9 6)]NumExp [3]NumExp [ ]NumExp
In graphical depictions of ASTs, a sequence node will be drawn as a solid circle whose components (indexed starting at 1) branch out from the node. E.g., Figure 2.8 shows the AST for (+ 7 (+ 5 8 4) (+ 9 6)) in EL with extended addition expressions. Note that a sequence pattern can match any number of elements, including zero or one. To specify that an addition should have a minimum of two operands, we could use the following production pattern: (+ NE rand1 NE rand2 NE ∗rest )
A postfix + is similar to ∗ , except that the pattern matches only a sequence with at least one element. Thus, the pattern (+ NE + rand ) is an alternative way 3
[ ]NumExp denotes the empty sequence of numerical expressions, as explained in Section A.3.5.
32
Chapter 2 Syntax
of expressing the essence of the pattern (+ NE rand NE ∗rest ). However, the two patterns are subtly different: (+ NE + rand ) denotes an AST node with a single component that is a sequence of numerical expressions, while (+ NE rand NE ∗rest ) denotes an AST node with two components — a numerical expression (its rand) and a sequence of numerical expressions (its rest). A postfix ? indicates a sequence of either zero or one elements of a domain. It is used to specify optional syntactic elements. For example, (- E1 E2? ) describes the syntax for a - operator that designates subtraction (in the two-element case) or unary negation (in the one-element case). A postfix *, + , or ? can be attached to any s-expression pattern, not just a domain variable. For example, in the s-expression pattern (cond (BE test NE then )∗ (else NE default )? )
the subpattern (BE test NE then )∗ matches any sequence of parenthesized clauses containing a boolean expression followed by a numerical expression, and the subpattern (else NE default )? matches an optional else clause. To avoid ambiguity, s-expression grammars are not allowed to use s-expression patterns in which multiple sequence patterns enable a single s-expression to match a pattern in more than one way. As an example of a disallowed pattern, consider (op NE ∗rand1 NE ∗rand2 ), which could match the s-expression (op 1 2) in three different ways: • NE ∗rand1 = [1, 2]NumExp and NE ∗rand2 = [ ]NumExp • NE ∗rand1 = [1]NumExp and NE ∗rand2 = [2]NumExp • NE ∗rand1 = [ ]NumExp and NE ∗rand2 = [1, 2]NumExp A disallowed pattern can always be transformed into a legal pattern by inserting explicit parentheses to demarcate components. For instance, the following are all unambiguous legal patterns: (op (NE ∗rand1 ) (NE ∗rand2 )) (op (NE ∗rand1 ) NE ∗rand2 ) (op NE ∗rand1 (NE ∗rand2 ))
2.3.5
Notational Conventions
In addition to the s-expression patterns described above, we will employ a few other notational conventions for syntax.
2.3.5 Notational Conventions
33 Addition rand
1 2 Addition rand
IntVal
3 Addition rand
7 1 IntVal
2 3 IntVal
IntVal
1 IntVal
5
8
4
9
2 IntVal 6
Figure 2.8 AST notation for (+ 7 (+ 5 8 4) (+ 9 6)) in EL with extended addition expressions.
Domain Variables In addition to being used in s-expression patterns, domain variables can appear inside s-expressions when they denote particular s-expressions. For example, if NE 1 is the s-expression (+ 1 2) and NE 2 is the s-expression (- 3 4), then (* NE 1 NE 2 ) is the same syntactic entity as (* (+ 1 2) (- 3 4)). Ellipsis Notation If SX is an s-expression pattern denoting an element of a syntactic domain D, then the ellipsis notation SX j . . . SX k specifies a sequence with (k − j + 1) elements from D*. For example, (+ NE 1 . . . NE 5 ) designates an EL extended addition expression with 5 operands, and (cond (BE 1 NE 1 ) . . . (BE m NE m ) (else NE default ))
stands for an expression containing m pairs of the form (BE i NE i ). The pattern (+ N1 . . . Ni−1 NE i . . . NE n )
designates an EL extended addition expression with n operands in which the first i − 1 operands are numeric literals (a specific kind of numeric expression) and the remaining operands are arbitrary numeric expressions. Note that ellipsis notation can denote sequences with zero elements or one element: SX j . . . SX k denotes a sequence with one element if k = j and a sequence with zero elements if k = (j − 1).
34
Chapter 2 Syntax
Index Notation To abbreviate ellipsis notation further, we will sometimes employ the indexed notation SX ki=j to stand for SX j . . . SX k , where SX i refers to a particular element of this sequence. Here are the ellipsis notation examples from above expressed with index notation: (cond (BE j
(+ NE 5i=1 ) NE j )m j=1 (else NE default )) i−1 (+ N j=1 NE nk=i )
Note that SX ki=j denotes a sequence with one element if k = j and a sequence with zero elements if k = (j − 1). Sequence Notation Sequence notation, including the infix notations for the cons (“ . ”) and append (“ @ ”) sequence functions (see Section A.3.5), can be intermixed with s-expression notation to designate sequence elements of compound syntactic domains. For example, all of the following are alternative ways of writing the same extended EL addition expression: (+ 1 2 3) (+ [1, 2, 3]) (+ [1, 2] @ [3]) (+ 1 . [2, 3])
Similarly, if NE 1 = 1, NE ∗2 = [2, (+ 3 4)], and NE ∗3 = [(* 5 6), (- 7 8)], then (+ NE 1 . NE ∗2 ) designates the same syntactic entity as (+ 1 2 (+ 3 4))
and (+ NE ∗2 @ NE ∗3 ) designates the same syntactic entity as (+ 2 (+ 3 4) (* 5 6) (- 7 8))
The sequence notation is legal only in positions where a production for a compound syntactic domain contains a sequence pattern. For example, the following notations are illegal because if expressions do not contain any component sequences: (if [(< (arg 1) 1), 2, 3]) (if [(< (arg 1) 1), 2] @ [3]) (if (< (arg 1) 1) . [2, 3])
2.3.5 Notational Conventions
35
nheight : NumExp → Nat nheight[[N ]] = 0 nheight[[(arg N )]] = 0 nheight[[(A NE 1 NE 2 )]] = (1 +Nat (max nheight[[NE 1 ]] nheight[[NE 2 ]])) nheight[[(if BE test NE then NE else )]] = (1 +Nat (max bheight[[BE test ]] (max nheight[[NE then ]] nheight[[NE else ]]))) bheight : BoolExp → Nat bheight[[B ]] = 0 bheight[[(R NE 1 NE 2 )]] = (1 +Nat (max nheight[[NE 1 ]] nheight[[NE 2 ]])) bheight[[(L BE 1 BE 2 )]] = (1 +Nat (max bheight[[BE 1 ]] bheight[[BE 2 ]])) Figure 2.9 domains.
Two examples illustrating the form of function definitions on syntactic
Similarly, the notation (+ 1 [2, 3]) is not legal for an EL extended addition expression, because the production pattern (+ NE ∗rand ) requires a single sequence component, not two components (a numerical expression and a sequence of numerical expressions). If the production pattern were instead (+ NE rand NE ∗rest ), then the expression (+ 1 [2, 3]) would match the pattern, but (+ [1, 2, 3]), (+ [1, 2] @ [3]), and (+ 1 . [2, 3]) would not. However, according to our conventions, (+ 1 2 3) would match either of these production patterns. Sequence notation can be used in s-expression patterns as well. For example, the pattern (+ NE rand1 . NE ∗rest ) matches any extended addition expression with at least one operand, while the pattern (+ [4, 7] @ NE ∗rest ) matches any extended addition expression whose first two operands are 4 and 7. Syntactic Functions We will follow a convention (standard in the semantics literature) that functions on compound syntactic domains are defined by a series of clauses, one for each production. Figure 2.9 illustrates this style of definition for two functions on EL expressions: nheight specifies the height of a numerical expression, while bheight specifies the height of a boolean expression. Each clause consists of two parts: a head that specifies an s-expression pattern from a production; and a body defining the meaning of the function for s-expressions that match the head. The double brackets, [[ ]], are often used in syntactic functions to demarcate a syntactic operand. They help to visually distinguish phrases in the programing language being processed from phrases in the metalanguage defining
36
Chapter 2 Syntax
the function. These brackets may be viewed as part of the name of the syntactic function. In function applications involving bracket notation, the function is assumed to bind tightly with the syntactic argument. For instance, the application max nheight[[NE 1 ]] nheight[[NE 2 ]] is parsed as if it were written (max (nheight[[NE 1 ]]) (nheight[[NE 2 ]])).
2.3.6
Mathematical Foundation of Syntactic Domains
Exactly what kinds of entities are defined by s-expression grammars? The answer to this question is important, because we will spend the rest of this book manipulating such entities. Intuitively, each compound syntactic domain in an s-expression grammar is defined to be the set of trees whose structure is determined by the productions for that domain. But can we define these trees in a more formal way? Yes! Using the domain concepts introduced in Section A.3, we can precisely define the mathematical structures specified by an s-expression grammar via what we will call the sum-of-products interpretation. An s-expression grammar defines a (potentially mutually recursive) collection of syntactic domains. In the sum-of-products interpretation we define: • the primitive syntactic domains mentioned in the s-expression grammar, each simply containing the elements specified for that domain; • a new domain for each production, which we name with the phrase type of that production and define to be the product of the domains associated with the domain-variable occurrences in the production pattern; • the compound syntactic domains mentioned in the s-expression grammar, each defined as a sum of domains, one for each production for that domain. Note these special cases: • The domain for a production containing exactly one domain-variable occurrence turns out to be a synonym for the domain associated with that domain variable. • A compound domain with just one production turns out to be a synonym for the domain associated with that production. • A production containing no domain-variable occurrences represents the Unit domain.
2.3.6 Mathematical Foundation of Syntactic Domains
37
Prog = Program Program = IntLit × NumExp NumExp = IntVal + Input + ArithmeticOperation + Conditional IntLit = {. . . , -2, -1, 0, 1, 2, . . .} IntVal = IntLit Input = IntLit ArithmeticOperation = ArithmeticOperator × NumExp × NumExp ArithmeticOperator = {+, -, *, /, %} Conditional = BoolExp × NumExp × NumExp BoolExp = BoolVal + RelationalOperation + LogicalOperation BoolLit = {true, false} BoolVal = BoolLit RelationalOperation = RelationalOperator × NumExp × NumExp RelationalOperator = {} LogicalOperation = LogicalOperator × BoolExp × BoolExp LogicalOperator = {and, or} Figure 2.10 Syntactic domains for sum-of-products interpretation of the s-expression grammar for EL.
Any occurrence of a sequence pattern PT ∗ in a production represents a sequence domain whose elements are described by the pattern PT . For example, Figure 2.10 shows the complete domain definitions implied by the s-expression grammar for EL in Figure 2.4. Recall that the Prog domain is defined by the single production pattern (el Nnumargs NE body ), with phrase type Prog. So Prog is a synonym for Program, a product domain of IntLit (the domain associated with the domain variable N ) and NumExp (the domain associated with the domain variable NE ). In the s-expression grammar, the NumExp domain is defined by the following four productions: NE ::= | | |
Nnum (arg Nindex ) (Arator NE rand1 NE rand2 ) (if BE test NE then NE else )
[IntVal] [Input] [ArithmeticOperation] [Conditional]
So NumExp is interpreted as a sum of four domains: 1. the IntVal domain, a synonym for IntLit, representing an integer literal; 2. the Input domain, a synonym for IntLit, representing the index of a reference to a program input;
38
Chapter 2 Syntax
3,
Conditional NumExp LogicalOperation BoolExpand, RelationalOperation BoolExp >, (Input NumExp 1), (IntVal NumExp 1), RelationalOperation BoolExp Nindex 0) ∧ ¬(compare > Nindex Nsize ) Evaluation Contexts E ∈ ElmEvalContext ::= 2 [Hole] | (A E NE ) [EvalLeft] | (A N E) [EvalRight] Transition Relation (⇒) E{R}, N ∗ ⇒ E{R }, N ∗ where R, N ∗ R Figure 3.14
A context-based specification of the ELM transition relation.
Redexes R ∈ PostFixRedex ::= | | |
[V , pop] [Pop] [V1 , V2 , swap] [Swap] [ArithmeticOperation] [N1 , N2 , A] . . . other redexes left as an exercise . . .
Reduction Relation () [V , pop] [ ] [V1 , V2 , swap] [V2 , V1 ] [N1 , N2 , A] [Nans ] where Nans = (calculate A N1 N2 ) . . . other reduction rules left as an exercise . . . Evaluation Contexts EQ ∈ PostfixEvalSequenceContext ::= V ∗ @ 2 @ Q Transition Relation (⇒) EQ{R} ⇒ EQ{R }, where R R Figure 3.15 PostFix.
A context-based specification of the transition relation for a subset of
3.3 Big-step Operational Semantics NE − − N E Nans (elmm NE ) − − P Nans
[prog]
N − − N E N
[num]
− N E N1 ; NE 2 − − N E N2 NE 1 − (A NE 1 NE 2 ) − − N E Nans where Nans = (calculate A N1 N2 ) Figure 3.16
3.3
75
[arithop]
Big-step operational semantics for ELMM.
Big-step Operational Semantics
A small-step operational semantics is a framework for describing program execution as an iterative sequence of small computational steps. But this is not always the most natural way to view execution. We often want to evaluate a phrase by recursively evaluating its subphrases and then combining the results. This is the key idea of denotational semantics, which we shall study in Chapter 4. However, this idea also underlies an alternative form of operational semantics, called big-step operational semantics (BOS) (also known as natural semantics). Here we briefly introduce big-step semantics in the context of a few examples. Let’s begin by defining a BOS for the simple expression language ELMM, in which programs are numerical expressions that are either numerals or arithmetic operations. A BOS typically has an evaluation relation for each nontrivial syntactic domain that directly specifies a result for a given program phrase or configuration. The BOS in Figure 3.16 defines two evaluation relations: 1. − − N E ∈ NumExp × IntLit specifies the evaluation of an ELMM numerical expression; and 2. − − P ∈ Prog × IntLit specifies the evaluation of an ELMM program. There are two rules specifying − − N E . The [num] rule says that numerals evaluate to themselves. The [arithop] rule says that evaluating an arithmetic operation (A NE 1 NE 2 ) yields the result (Nans ) of applying the operator to the results − P (N1 and N2 ) of evaluating the operands. The single [prog] rule specifying − just says that the result of an ELMM program is the result of evaluating its numerical expression. As with SOS transitions, each instantiation of a BOS evaluation rule is justified by a proof tree, which we shall call an evaluation tree. Below is the proof tree for the evaluation of the program (elmm (* (- 7 4) (+ 5 6))):
76
Chapter 3 Operational Semantics 7− − N E 7
[num]
4− − N E 4
(- 7 4) − − N E 3
[num] [arithop]
5− − N E 5
[num]
6− − N E 6
(+ 5 6) − − N E 11
(* (- 7 4) (+ 5 6)) − − N E 33 (elmm (* (- 7 4) (+ 5 6))) − − P 33
[num] [arithop]
[arithop]
[prog]
Unlike the proof tree for an SOS transition, which justifies a single computational step, the proof tree for a BOS transition justifies the entire evaluation! This is the sense in which the steps of a BOS are “big”; they tell how to go from a phrase to an answer (or something close to an answer). In the case of ELMM, the leaves of the proof tree are always trivial evaluations of numerals to themselves. With BOS evaluations there is no notion of a stuck state. In the ELMM BOS, there is no proof tree for an expression like (* (/ 7 0) (+ 5 6)) that contains an error. However, we can extend the BOS to include an explicit error token as a possible result and modify the rules to generate and propagate such a token. Since all ELMM programs terminate, a BOS with this extension completely specifies the behavior of a program. But in general, the top-level evaluation rule for a program only partially specifies its behavior, since there is no tree (not even an infinite one) asserting that a program loops. What would the answer A of such a program be in the relation P − − P A? The ELMM BOS rules also do not specify the order in which operands are evaluated, but this is irrelevant since there is no way in ELMM to detect whether one operation is performed before another. The ELMM BOS rules happen to specify a function, which implies that ELMM evaluation is deterministic. In general, a BOS may specify a relation, so it can describe nondeterministic evaluation as well. In ELMM, the evaluation relation maps a code phrase to its result. In general, the LHS (and RHS) of an evaluation relation can be more complex, containing state components in addition to a code component. This is illustrated in the BOS for ELM, which extends ELMM with an indexed input construct (Figure 3.17). Here, the two evaluation relations have different domains than before: they include an integer numeral sequence to model the program arguments. 1. − − N E ∈ (NumExp × IntLit*) × IntLit specifies the evaluation of an ELM numerical expression; and 2. − − P ∈ (Prog × IntLit*) × IntLit specifies the evaluation of an ELM program. Each of these relations can be read as “evaluating a program phrase relative to the program arguments to yield a result.” As a notational convenience, we abbreviate
3.3 Big-step Operational Semantics
77
[N ,...,N ]
1 n NE − −− −−−− N E Nans
[N ,...,N ]
1 n (elm Nnumargs NE ) − −−−−−− P Nans where (compare = Nnumargs n) and n stands for the IntLit N that denotes n ∈ Int
[prog]
∗
N −N E N N − ∗
N −N E N1 NE 1 −
;
[num] ∗
N NE 2 − −N E N2
[arithop]
∗
N (A NE 1 NE 2 ) − −N E Nans where Nans = (calculate A N1 N2 ) [N ,...,N ]
1 n −− −−−− N E NNindex (arg Nindex ) − where (compare > Nindex 0) ∧ ¬(compare > Nindex n)
Figure 3.17
[input]
Big-step operational semantics for ELM. N∗
args ∗ − X, Nargs − X Nans as X − −−− X Nans , where X ranges over P and NE . The [prog] rule is as in ELMM, except that it checks that the number of arguments is as expected and passes them to the body for its evaluation. These arguments are ignored by the [num] and [arithop] rules, but are used by the [input] rule to return the specified argument. Here is a sample ELM proof tree showing the evaluation of the program (elm 2 (* (arg 1) (+ 1 (arg 2)))) on the two arguments 7 and 5:
[7,5]
[7,5]
(arg 1) −−−N E 7
[input]
1 −−−N E 1
[num]
[7,5]
(arg 2) −−−N E 5 [7,5]
(+ 1 (arg 2)) −−−N E 6 [7,5]
(elm 2 (* (arg 1) (+ 1 (arg 2)))) −−−P 42
[input] [arithop]
[prog]
Can we describe PostFix execution in terms of a BOS? Yes — via the evaluation relations − − P (for programs) and − − Q (for command sequences) in Figure 3.18. The − − Q relation ∈ (CommandSeq × Stack) × Stack treats command sequences as “stack transformers” that map an input stack to an output stack. S We abbreviate Q, S − − Q S as Q − − Q S . The [non-exec] rule “cheats” by using the SOS transition relation ⇒ to specify how a non-exec command C transforms the stack to S . Then − − Q specifies how the rest of the commands transform S into S . The [exec] rule is more interesting because it uses − − Q in both antecedents. The executable sequence commands Qexec transform S to S , while the remaining commands Qrest transform S to S . The [exec] and [nonexec] rules illustrate how evaluation order (in this case, executing Qexec before Qrest
78
Chapter 3 Operational Semantics [N ,...,N ]
1 n Q − −−−−−− Q Nans . S
[prog]
[N ,...,N ]
1 n (postfix Nnumargs Q) − −− −−−− P Nans where (compare = Nnumargs n) and n stands for the IntLit N that denotes n ∈ Int
S
C . Q, S ⇒ Q, S ; Q − − Q S S C .Q − − Q S where C = exec S Qexec − − Q S
;
[non-exec]
S
Qrest − − Q S
(Qexec ) . S exec . Qrest − −−−−−− Q S
Figure 3.18
[exec]
Big-step operational semantics for PostFix.
or C before Q) can be specified in a BOS by “threading” a state component (in this case, the stack) through an evaluation. It is convenient to define − − Q so that it returns a stack, but stacks are not the final answer we desire. The [prog] rule ∈ (Prog × IntLit*) × Stack takes care of creating the initial stack from the arguments and extracting the top integer numeral (if it exists) from the final stack. How do small-step and big-step semantics stack up against each other? Each has its advantages and limitations. A big-step semantics is often more concise than a small-step semantics, and one of its proof trees can summarize the entire execution of a program. The recursive nature of a big-step semantics also corresponds more closely to the structure of interpreters for high-level languages than a small-step semantics does. On the other hand, the iterative step-by-step nature of a small-step semantics corresponds more closely to the way low-level languages are implemented, and it is often a better framework for reasoning about computational resources, errors, and termination. Furthemore, infinite loops are easy to model in a small-step semantics but not in a big-step semantics. We will use small-step semantics as our default form of operational semantics throughout the rest of this book. This is not because the big-step semantics approach is not useful — it is — but because we will tend to use denotational semantics rather than big-step operational semantics for language specifications that compose the meanings of whole phrases from subphrases. Exercise 3.15 Construct a BOS evaluation tree that shows the evaluation of (postfix 2 (2 (3 mul add) exec) 1 swap exec sub)
on arguments 4 and 5.
3.4 Operational Reasoning
79
Exercise 3.16 Extend the BOS in Figure 3.16 to handle the full EL language. You will need a new evaluation relation, − − BE , to handle boolean expressions. Exercise 3.17 Modify each of the BOS specifications in Figures 3.16–3.18 to generate and propagate an error token that models signaling an error. Be careful to handle all error situations.
3.4
Operational Reasoning
The suitability of a programming language for a given purpose largely depends on many high-level properties of the language. Important global properties of a programming language include: • universality: the language can express all computable programs; • determinism: the set of possible outcomes from executing a program on any particular inputs is a singleton; • strong normalization: all programs are guaranteed to terminate on all inputs (i.e., it is not possible to express an infinite loop); • static checkability: a class of program errors can be found by static analysis without resorting to execution; • referential transparency: different occurrences of an expression within the same context always have the same meaning. Languages often exhibit equivalence properties that allow safe transformations: systematic substitutions of one program phrase for another that are guaranteed not to change the behavior of the program. Finally, properties of particular programs are often of interest. For instance, we might want to show that a given program terminates, that it uses only bounded resources, or that it is equivalent to some other program. For these sorts of purposes, an important characteristic of a language is how easy it is to prove properties of particular programs written in a language. A language exhibiting a desired list of properties may not always exist. For example, no language can be both universal and terminating, because a universal language must be able to express infinite loops. (But it is often possible to carve a terminating sublanguage out of a universal language.) The properties of a programming language are important to language designers, implementers, and programmers alike. The features included in a language strongly depend on what properties the designers want the language to have. For
80
Chapter 3 Operational Semantics
example, designers of a language in which all programs are intended to terminate cannot include general looping constructs, while designers of a universal language must include features that allow nontermination. Compiler writers extensively use safe transformations to automatically improve the efficiency of programs. The properties of a language influence which language a programmer chooses for a task as well as what style of code the programmer writes. An important benefit of a formal semantics is that it provides a framework that facilitates proving properties both about the entire language and about particular programs written in the language. Without a formal semantics, our understanding of such properties would be limited to intuitions and informal (and possibly incorrect) arguments. A formal semantics is a shared language for convincing both ourselves and others that some intuition that we have about a program or a language is really true. It can also help us develop new intuitions. It is useful not only to the extent that it helps us construct proofs but also to the extent that it helps us find holes in our arguments. After all, some of the things we think we can prove simply aren’t true. The process of constructing a proof can give us important insight into why they aren’t true. In the next three sections, we use operational semantics to reason about EL and PostFix. In Section 3.5, we discuss the deterministic behavior of EL under various conditions. Then we show in Section 3.6 that all PostFix programs are guaranteed to terminate. In Section 3.7, we consider conditions under which we can transform one PostFix command sequence to another without changing the behavior of a program.
3.5
Deterministic Behavior of EL
A programming language is deterministic if there is exactly one possible outcome for any pair of program and inputs. In Section 3.2.1, we saw that a deterministic SOS transition relation implies that programs behave deterministically. In Section 3.2.3, we argued that the PostFix transition relation is deterministic, so PostFix is a deterministic language. We can similarly argue that EL is deterministic. We will give the argument for the sublanguage ELMM, but it can be extended to full EL. There are only three SOS rewrite rules for ELMM (Figure 3.7 on page 65): [arithop], [prog-left], and [prog-right]. For a given ELMM numerical expression NE , we argue that there is at most one proof tree using these three rules that justifies a transition for NE . The proof is by structural induction on the AST for NE .
3.5 Deterministic Behavior of EL
81
• (Base cases) If NE is a numeral, it matches no rules, so there is no transition. If NE has the form (A N1 N2 ), it can match only the [arithop] rule, since there are no transitions involving numerals. • (Induction cases) NE must have the form (A NE 1 NE 2 ), where at least one of NE 1 and NE 2 is not a numeral. If NE 1 is not a numeral, then NE can match only the [prog-left] rule, and only in the case where there is a proof tree justifying the transition NE 1 ⇒ NE 1 . By induction, there is at most one such proof tree, so there is at most one proof tree for a transition of NE . If NE 1 is a numeral, then NE 2 must not be a numeral, in which case NE can match only the [prog-right] rule, and similar reasoning applies. Alternatively, we can prove the determinism of the ELMM transition relation using the context semantics in Figure 3.13. In this case, we need to show that each ELMM numerical expression can be parsed into an evaluation context and redex in at most one way. Such a proof is essentially the same as the one given above, so we omit it. The ELMM SOS specifies that operations are performed in left-to-right order. Why does the order of evaluation matter? It turns out that it doesn’t — there is no way in ELMM to detect the order in which operations are performed! Intuitively, either the evaluation is successful, in which case all operations are performed anyway, leading to the same answer, or a division or remainder by zero is encountered somewhere along the way, in which case the evaluation is unsuccessful. Note that if we could distinguish between different kinds of errors, the story would be different. For instance, if divide-by-zero gave a different error from remainder-by-zero, then evaluating the expression (+ (/ 1 0) (% 2 0)) would indicate which of the two subexpressions was evaluated first. The issue of evaluation order is important to implementers, because they sometimes can make programs execute more efficiently by reordering operations. How can we formally show that evaluation order in ELMM does not matter? We begin by replacing the [prog-right] rule in the SOS by the following [progright ] rule to yield a modified ELMM transition relation ⇒ . (A NE 1
NE 2 ⇒ NE 2 NE 2 ) ⇒ (A NE 1 NE 2 )
[prog-right ]
Now operands can be evaluated in either order, so the transition relation is no longer deterministic. For example, the expression (* (- 7 4) (+ 5 6)) now has two transitions: (* (- 7 4) (+ 5 6)) ⇒ (* 3 (+ 5 6)) (* (- 7 4) (+ 5 6)) ⇒ (* (- 7 4) 11)
82
Chapter 3 Operational Semantics
Nevertheless, we would like to argue that the behavior of programs is still deterministic even though the transition relation is not. A handy property for this purpose is called confluence. Informally, confluence says that if two transition paths from a configuration diverge, there must be a way to bring them back together. The formal definition is as follows: Definition 3.1 (Confluence) A relation → ∈ X × X is confluent if and ∗ ∗ only if for every x1 , x2 , x3 ∈ X such that x1 − − → x2 and x1 − − → x3 , there exists ∗ ∗ an x4 such that x2 − − → x4 and x3 − − → x4 . Confluence is usually displayed via the following diagram, in which solid lines are the given relations and the dashed lines are assumed to exist when the property holds. x1 ∗
∗ x2
x3 ∗
∗ x4
∗ Because of the shape of the diagram, − − → is said to satisfy the diamond property. Saying that a relation is Church-Rosser (CR for short) is the same as saying it is confluent.
Suppose that a transition relation ⇒ is confluent. Then if an initial configuration cfi has transition paths to two final configurations cff1 and cff2 , these are necessarily the same configuration! Why? By confluence, there must be a config∗ ∗ uration cf such that cff1 ⇒ cf and cff2 ⇒ cf. But cff1 and cff2 are elements of Irreducible , so the only transition paths leaving them have length 0. This means cff1 = cf = cff2 . Thus, a confluent transition relation guarantees a unique final configuration. Indeed, it guarantees a unique irreducible configuration: it is not possible to get stuck on one path and reach a final configuration on another. Confluence by itself does not guarantee a single outcome. It is still possible for a confluent transition relation to have some infinite paths, in which case there is a second outcome (∞). This possibility must be ruled out to prove deterministic behavior. In the case of ELMM — and even EL — it is easy to prove there are no loops (see Exercise 3.27 on page 89). We can now show that ELMM has deterministic behavior under ⇒ by arguing that ⇒ is confluent. We will actually show a stronger property, known as one-step confluence, in which the transitive closure stars in the diamond diagram are removed; confluence easily follows from one-step confluence.
3.5 Deterministic Behavior of EL
83
Suppose that NE 1 ⇒ NE 2 and NE 1 ⇒ NE 3 . Any such ELMM transition is justified by a linear derivation (like the one depicted in Figure 3.8 on page 65) whose single leaf is an instance of the [arithop] rule. As in context-based semantics, we will call the LHS of the basic arithmetic transition justified by this [arithop] rule a redex. Call the redex reduced in NE 1 ⇒ NE 2 the “red” redex and the one reduced in NE 1 ⇒ NE 3 the “blue” redex. Either these are the same redex, in which case NE 2 = NE 3 trivially joins the paths, or the redexes are disjoint, i.e., one does not occur as a subexpression of another. (A redex has the form (A N1 N2 ), and the integer numerals N1 and N2 cannot contain another redex.) In the latter case, there must be an expression NE 4 that is a copy of NE 1 in which both the red and blue redexes have been reduced. Then NE 2 ⇒ NE 4 by reducing the blue redex and NE 3 ⇒ NE 4 by reducing the red redex. So NE 4 joins the diverging transitions. We have shown that ELMM has deterministic behavior even when its operations are performed in a nondeterministic order. A similar approach can be used to show that ELM and EL have the same property (see Exercise 3.20). Confluence in these languages is fairly straightforward. It becomes much trickier in languages where redexes overlap or performing one redex can copy another. We emphasize that confluence is a sufficient but not necessary condition for a nondeterministic transition relation to give rise to deterministic behavior. That is, confluence implies deterministic behavior, but deterministic behavior can exist without confluence. In general, many distinct final configurations might map to the same outcome. Exercise 3.18 Suppose that in addition to replacing [prog-right] with [prog-right ] in the ELMM SOS, we add the rule [prog-both] introduced on page 65 to the SOS. a. In this modified SOS, how many different transition paths lead from the expression (/ (+ 25 75) (* (- 7 4) (+ 5 6))) to the result 3? b. Does the modified SOS still have deterministic behavior? Explain your answer. Exercise 3.19 Consider extending ELMM with a construct (either NE 1 NE 2 ) that returns the result of evaluating either NE 1 or NE 2 . a. What are the possible behaviors of the following program? (elmm (* (- (either 1 2) (either 3 4)) (either 5 6)))
b. The informal specification of either given above is ambiguous. For example, must the expression (+ (either 1 (/ 2 0)) (either (% 3 0) 4)) return the result 5, or can it get stuck? The semantics of either can be defined either way. Give a formal specification for each interpretation of either that is consistent with the informal description.
84
Chapter 3 Operational Semantics
Exercise 3.20 a. Show that the two transition relations (one for NumExp, one for BoolExp) in an EL SOS can be deterministic. b. Suppose that both transition relations in an EL SOS allow operations to be performed in any order, so that they are nondeterministic. Argue that the behavior of EL programs is still deterministic.
3.6
Termination of PostFix Programs
The strong normalization property of PostFix is expressed by the following theorem: Theorem 3.2 (PostFix Termination) All PostFix programs are guaranteed to terminate. That is, executing a PostFix program on any inputs always either returns a numeral or signals an error. This theorem is based on the following intuition: existing commands are consumed by execution, but no new commands are ever created, so the commands must eventually “run out.” This intuition is essentially correct, but an intuition does not a proof make. After all, PostFix is complex enough to harbor a subtlety that invalidates the intuition. The nget command allows the duplication of numerals — is this problematic with regard to termination? Executable sequences are moved to the stack, but their contents can later be prepended to the command sequence. How can we be certain that this shuffling between command sequence and stack doesn’t go on forever? And how do we deal with the fact that executable sequences can be arbitrarily nested? In fact, the termination theorem can fail to hold if PostFix is extended with new commands, such as a dup command that duplicates the top stack value (see Section 3.8 for details). These questions indicate the need for a more convincing argument that termination is guaranteed. This is the kind of situation in which formal semantics comes in handy. Below we present a proof for termination based on the SOS for PostFix.
3.6.1
Energy
Associate with each PostFix configuration a natural number called its energy (so called to suggest the potential energy of a dynamical system). By considering each rewrite rule of the semantics in turn, we will prove that the energy strictly decreases with each transition. The energy of an initial configuration must then
3.6.1 Energy
85
be an upper bound on the length of any path of transitions leading from the initial configuration. Since the initial energy is finite, there can be no unbounded transition sequences from the initial configuration, so the execution of a program must terminate. The energy of a configuration is defined by the following energy functions: Econfig [[Q, S ]] = Eseq [[Q]] + Estack [[S ]] Eseq [[[ ]Command ]] = 0 Eseq [[C . Q]] = 1 + Ecom [[C ]] + Eseq [[Q]] Estack [[[ ]Value ]] Estack [[V . S ]] Ecom [[(Q)]] Ecom [[C ]]
= 0 = Ecom [[V ]] + Estack [[S ]] = Eseq [[Q]] = 1, C not an executable sequence.
(3.1) (3.2) (3.3) (3.4) (3.5) (3.6) (3.7)
These definitions embody the following intuitions: • The energy of a configuration, sequence, or stack is greater than or equal to the sum of the energy of its components. • Executing a command consumes at least one unit of energy (the 1 that appears in 3.3). This is true even for commands that are transferred from the code component to the stack component (i.e., numerals and executable sequences); such commands are worth one more unit of energy in the command sequence than on the stack.8 • Since the commands in an executable sequence may eventually be executed, an executable sequence on the stack must have at least as much energy as its component command sequence. This is the essence of 3.6, where Ecom [[(Q)]] is interpreted as the energy of a command sequence on the stack (by 3.5). The following lemmas are handy for reasoning about the energy of sequences: Ecom [[C ]] ≥ 0 Eseq [[Q1 @ Q2 ]] = Eseq [[Q1 ]] + Eseq [[Q2 ]]
(3.8) (3.9)
These can be derived from the energy definitions above. Their derivations are left as an exercise. Equipped with the energy definitions and lemmas 3.8 and 3.9, we are ready to prove the PostFix Termination Theorem. 8
The invocation Ecom [[V ]] that appears in 3.5 may seem questionable because Ecom [[]] should be called on elements of Command, not elements of Value. But since every stack value is also a command, the invocation is well defined.
86
Chapter 3 Operational Semantics
3.6.2
The Proof of Termination
Proof: We show that every transition reduces the energy of a configuration. Recall that every transition in an SOS has a proof in terms of the rewrite rules. In the case of PostFix, where all the rules are axioms, the proof is trivial: every PostFix transition is justified by one rewrite axiom. To prove a property about PostFix transitions, we just need to show that it holds for each rewrite axiom in the SOS. Here’s the case analysis for the energy reduction property: • [num]: N . Q, S ⇒ Q, N . S Econfig [[N . Q, S ]] by 3.1 = Eseq [[N . Q]] + Estack [[S ]] = 1 + Ecom [[N ]] + Eseq [[Q]] + Estack [[S ]] by 3.3 by 3.5 = 1 + Eseq [[Q]] + Estack [[N . S ]] by 3.1 = 1 + Econfig [[Q, N . S ]]
The LHS has one more unit of energy than the RHS, so moving a numeral to the stack reduces the configuration energy by one unit. • [seq]: (Qexec ) . Qrest , S ⇒ Qrest , (Qexec ) . S Moving an executable sequence to the stack also consumes one energy unit by exactly the same argument as for [num]. • [pop]: pop . Q, Vtop . S ⇒ Q, S Popping Vtop off a stack takes at least two energy units: Econfig [[pop . Q, Vtop . S ]] = Eseq [[pop . Q]] + Estack [[Vtop . S ]] = 1 + Ecom [[pop]] + Eseq [[Q]] + Ecom [[Vtop ]] + Estack [[S ]] = 2 + Ecom [[Vtop ]] + Eseq [[Q]] + Estack [[S ]] ≥ 2 + Econfig [[Q, S ]]
by by by by
3.1 3.3 and 3.5 3.7 3.1 and 3.8
• [swap]: swap . Q, V1 . V2 . S ⇒ Q, V2 . V1 . S Swapping the top two elements of a stack consumes two energy units: Econfig [[swap . Q, V1 . V2 . S ]] = Eseq [[swap . Q]] + Estack [[V1 . V2 . S ]] = 1 + Ecom [[swap]] + Eseq [[Q]] + Ecom [[V1 ]] + Ecom [[V2 ]] + Estack [[S ]] = 2 + Eseq [[Q]] + Estack [[V2 . V1 . S ]] = 2 + Econfig [[Q, V2 . V1 . S ]]
by 3.1 by 3.3 and 3.5 by 3.7 and 3.5 by 3.1
3.6.2 The Proof of Termination
87
• [execute]: exec . Qrest , (Qexec ) . S ⇒ Qexec @ Qrest , S Executing the exec command consumes two energy units: Econfig [[exec . Qrest , (Qexec ) . S ]] = Eseq [[exec . Qrest ]] + Estack [[(Qexec ) . S ]] = 1 + Ecom [[exec]] + Eseq [[Qrest ]] + Ecom [[(Qexec )]] + Estack [[S ]] = 2 + Eseq [[Qexec ]] + Eseq [[Qrest ]] + Estack [[S ]] = 2 + Eseq [[Qexec @ Qrest ]] + Estack [[S ]] = 2 + Econfig [[Qexec @ Qrest , S ]]
by 3.1 by by by by
3.3 and 3.5 3.6 and 3.7 3.9 3.1
• [nget], [arithop], [relop-true], [relop-false], [sel-true], [sel-false]: These cases are similar to those above and are left as exercises for the reader. 3 The approach of defining a natural number function that decreases on every iteration of a process is a common technique for proving termination. However, inventing the function can sometimes be tricky. In the case of PostFix, we have to get the relative weights of components just right to handle movements between the program and stack. The termination proof presented above is rather complex. The difficulty is not inherent to PostFix, but is due to the particular way we have chosen to formulate its semantics. There are alternative formulations in which the termination proof is simpler (see Exercise 3.25 on page 89). Exercise 3.21 Show that lemmas 3.8 and 3.9 hold. Exercise 3.22 Complete the proof of the PostFix termination theorem by showing that the following axioms reduce configuration energy: [nget], [arithop], [relop-true], [relop-false], [sel-true], [sel-false]. Exercise 3.23 Bud “eagle-eye” Lojack notices that Definitions 3.2 and 3.4 do not appear as the justification for any steps in the PostFix Termination Theorem. He reasons that these definitions are arbitrary, so he could just as well use the following definitions instead: Eseq [[[]]] Estack [[[]]]
= =
17 23
( 3.2 ) ( 3.4 )
Is Bud correct? Explain your answer. Exercise 3.24 Prove the termination property of PostFix based on the SOS for PostFix2 from Exercise 3.7. a. Define an appropriate energy function on configurations in the alternative SOS. b. Show that each transition in the alternative SOS reduces energy.
88
3.6.3
Chapter 3 Operational Semantics
Structural Induction
The above proof is based on a PostFix SOS that uses only axioms. But what if the SOS contained progress rules, like [exec-prog] from Figure 3.10 in Section 3.2.5? How do we prove a property like reduction in configuration energy when progress rules are involved? Here’s where we can take advantage of the fact that every transition of an SOS must be justified by a finite proof tree based on the rewrite rules. Recall that there are two types of nodes in the proof tree: the leaves, which correspond to axioms, and the intermediate nodes, which correspond to progress rules. Suppose we can show that • the property holds at each leaf — i.e., it is true for (the consequent of) every axiom; and • the property holds at each intermediate node — i.e., for every progress rule, if the property holds for all of the antecedents, then it also holds for the consequent. Then, by induction on the height of its proof tree, the property must hold for each transition specified by the rewrite rules. This method for proving a property based on the structure of a tree (in this case the proof tree of a transition relation) is called structural induction. As an example of a proof by structural induction, we consider how the previous proof of the termination property for PostFix would be modified for an SOS that uses the [exec-done] and [exec-prog] rules in place of the [exec] rule. It is straightforward to show that the [exec-done] axiom reduces configuration energy; this is left as an exercise for the reader. To show that the [exec-prog] rule satisfies the property, we must show that if its single antecedent transition reduces configuration energy, then its consequent transition reduces configuration energy as well. Recall that the [exec-prog] rule has the form: Qexec , S ⇒ Qexec , S exec . Qrest , (Qexec ) . S ⇒ exec . Qrest , (Qexec ) . S
We assume that the antecedent transition, Qexec , S ⇒ Qexec , S
reduces configuration energy, so that the following inequality holds: Econfig [[Qexec , S ]] > Econfig [[Qexec , S ]]
[exec-prog]
3.7 Safe PostFix Transformations
89
Then we show that the consequent transition also reduces configuration energy: Econfig [[exec . Qrest , (Qexec ) . S ]] = Eseq [[exec . Qrest ]] + Estack [[(Qexec ) . S ]] = Eseq [[exec . Qrest ]] + Ecom [[(Qexec )]] + Estack [[S ]] = Eseq [[exec . Qrest ]] + Eseq [[Qexec ]] + Estack [[S ]] = Eseq [[exec . Qrest ]] + Econfig [[Qexec , S ]] , S ]] > Eseq [[exec . Qrest ]] + Econfig [[Qexec = Eseq [[exec . Qrest ]] + Eseq [[Qexec ]] + Estack [[S ]] )]] + Estack [[S ]] = Eseq [[exec . Qrest ]] + Ecom [[(Qexec = Eseq [[exec . Qrest ]] + Estack [[(Qexec ) . S ]] ) . S ]] = Econfig [[exec . Qrest , (Qexec
by by by by by by by by by
3.1 3.5 3.6 3.1 assumption 3.1 3.6 3.5 3.1
The > appearing in the derivation sequence guarantees that the energy specified by the first line is strictly greater than the energy specified by the last line. This completes the proof that the [exec-prog] rule reduces configuration energy. Together with the proofs that the axioms reduce configuration energy, this provides an alternative proof of PostFix’s termination property. Exercise 3.25 Prove the termination property of PostFix based on the alternative PostFix SOS suggested in Exercise 3.12 on page 70: a. Define an appropriate energy function on configurations in the alternative SOS. b. Show that each transition in the alternative SOS reduces energy. c. The termination proof for the alternative semantics should be more straightforward than the termination proofs in the text and in Exercise 3.24. What characteristic(s) of the alternative SOS simplify the proof? Does this mean the alternative SOS is a “better” one? Exercise 3.26 Prove that the rewrite rules [exec-prog] and [exec-done] presented in the text specify the same behavior as the [execute] rule. That is, show that for any configuration cf of the form exec . Q, S , both sets of rules eventually rewrite cf into either (1) a stuck state or (2) the same configuration. Exercise 3.27 As in PostFix, every program in the EL language terminates. Prove this fact based on an operational semantics for EL (see Exercise 3.10 on page 67).
3.7 3.7.1
Safe PostFix Transformations Observational Equivalence
One of the most important aspects of reasoning about programs is knowing when it is safe to replace one program phrase by another. Two phrases are said to be
90
Chapter 3 Operational Semantics
observationally equivalent (or behaviorally equivalent) if an instance of one can be replaced by the other in any program without changing the behavior of the program. Observational equivalence is important because it is the basis for a wide range of program transformation techniques. It is often possible to improve a pragmatic aspect of a program by replacing a phrase by one that is equivalent but more efficient. For example, we expect that the PostFix sequence [1, add, 2, add] can always be replaced by [3, add] without changing the behavior of the surrounding program. The latter may be more desirable in practice because it performs fewer additions. A series of simple transformations can sometimes lead to dramatic improvements in performance. Consider the following three transformations on PostFix command sequences, just three of the many safe PostFix transformations: Before [V1 , V2 , swap] [(Q), exec] [N1 , N2 , A]
After [V2 , V1 ] Q [Nans ] where Nans = (calculate A N1 N2 )
Name [swap-trans] [exec-trans] [arith-trans]
Applying these to our example of a PostFix command sequence yields the following sequence of simplifications: ((2 (3 mul add) exec) 1 swap exec sub) simp − − −− → ((2 3 mul add) 1 swap exec sub) simp − −−− → ((6 add) 1 swap exec sub) simp − − −− → (1 (6 add) exec sub) simp − − −− → (1 6 add sub) simp − −−− → (7 sub)
[exec-trans] [arith-trans] [swap-trans] [exec-trans] [arith-trans]
Thus, the original command sequence is a “subtract 7” subroutine. The transformations essentially perform operations at compile time that otherwise would be performed at run time. It is often tricky to determine whether two phrases are observationally equivalent. For example, at first glance it might seem that the PostFix sequence [swap, swap] can always be replaced by the empty sequence [ ]. While this transformation is valid in many situations, these two sequences are not observationally equivalent because they behave differently when the stack contains fewer than two elements. For instance, the PostFix program (postfix 0 1) returns 1 as a final answer, but the program (postfix 0 1 swap swap) generates an error. Two phrases are observationally equivalent only if they are interchangeable in all programs.
3.7.1 Observational Equivalence
91
P ∈ PostfixProgContext ::= (postfix Nnumargs Q) [ProgramContext] Q ∈ PostfixSequenceContext ::= | | | Figure 3.19
2 Q @Q Q@Q [(Q)]
[Hole] [Prefix] [Suffix] [Nesting]
Definition of PostFix contexts.
Observational equivalence can be formalized in terms of the notions of behavior and context presented earlier. Recall that the behavior of a program (see Section 3.2.1) is specified by a function beh that maps a program and its inputs to a set of possible outcomes: beh : (Prog × Inputs) → P(Outcome)
The behavior is deterministic when the resulting set is guaranteed to be a singleton. A program context is a program with a hole in it (see Section 3.2.6). Definition 3.3 (Observational Equivalence) Suppose that P ranges over program contexts and H ranges over the kinds of phrases that fill the holes in program contexts. Then H1 and H2 are defined to be observationally equivalent (written H1 =obs H2 ) if and only if for all program contexts P and all inputs I , beh P{H1 }, I = beh P{H2 }, I . We will consider PostFix as an example. An appropriate notion of program contexts for PostFix is defined in Figure 3.19. A command sequence context Q is one that can be filled with a sequence of commands to yield another sequence of commands. For example, if Q = [(2 mul), 3] @ 2 @ [exec], then Q{[4, add, swap]} = [(2 mul), 3, 4, add, swap, exec]. The [Prefix] and [Suffix] productions allow the hole to be surrounded by arbitrary command sequences, while the [Nesting] production allows the hole to be nested within an executable sequence command. (The notation [(Q)] designates a sequence containing a single element. That element is an executable sequence that contains a single hole.) Because of the presence of @ , the grammar for PostfixSequenceContext is ambiguous, but that will not affect our presentation, since filling the hole for any parsing of a sequence context yields exactly the same sequence. The possible outcomes of a program must be carefully defined to lead to a satisfactory notion of observational equivalence. The outcomes for PostFix defined in Section 3.2.1 are fine, but small changes can sometimes lead to surprising results. For example, suppose we allow PostFix programs to return the top value
92
Chapter 3 Operational Semantics
of a nonempty stack, even if the top value is an executable sequence. If we can observe the structure of a returned executable sequence, then this change invalidates all nontrivial program transformations! To see why, take any two sequences we expect to be equivalent (say, [1, add, 2, add] and [3, add]) and plug them into the context (postfix 0 (2)). In the modified semantics, the two outcomes are the executable sequences (1 add 2 add) and (3 add), which are clearly not the same, and so the two sequences are not observationally equivalent. The problem is that the modified SOS makes distinctions between executable sequence outcomes that are too fine-grained for our purposes. We can fix the problem by instead adopting a coarser-grained notion of behavior in which there is no observable difference between outcomes that are executable sequences. For example, the outcome in this case could be the token executable, indicating that the outcome is an executable sequence without divulging which particular executable sequence it is. With this change, all the expected program transformations become valid again.
3.7.2
Transform Equivalence
It is possible to show the observational equivalence of two particular PostFix command sequences according to the definition on page 91. However, we will follow another route. First, we will develop an easier-to-prove notion of equivalence for PostFix sequences called transform equivalence. Then, after giving an example of transform equivalence, we will prove a theorem that transform equivalence implies observational equivalence for PostFix programs. This approach has the advantage that the structural induction proof on contexts needed to show observational equivalence need be proved only once (for the theorem) rather than for every pair of PostFix command sequences. Transform equivalence is based on the intuition that PostFix command sequences can be viewed as a means of transforming one stack to another. Informally, transform equivalence is defined as follows: Definition 3.4 (Transform Equivalence) Two PostFix command sequences are transform equivalent if they always transform equivalent input stacks to equivalent output stacks. This definition is informal in that it doesn’t say how command sequences can be viewed as transformers or pin down what it means for two stacks to be equivalent. We will now flesh out these notions. Our approach to transform equivalence depends on a notion of the last stack reached when all commands are executed in a PostFix program. We model
3.7.2 Transform Equivalence
93
the possibility of executions stuck at a command by introducing a StackAnswer domain that contains the usual PostFix stacks (Figure 3.3 on page 53) along with a distinguished error stack element SAerror : ErrorStack = {errorStack} SA ∈ StackAnswer = Stack + ErrorStack SAerror : StackAnswer = (ErrorStack StackAnswer errorStack)
We now define a lastStack function that returns the last stack reached for a given initial command sequence and stack when all commands are executed: lastStack : CommandSeq → Stack → StackAnswer ∗ (Stack StackAnswer S ) if Q, S ⇒ [ ], S (lastStack Q S ) = otherwise SAerror
The lastStack function is well defined because PostFix is deterministic. The longest transition path starting with an initial configuration Q, S ends in a unique configuration that either has an empty command sequence or doesn’t. Because it handles the nonempty command sequence case by returning SAerror , lastStack is also a total function. For example, (lastStack [add, mul] [4, 3, 2, 1]) = [14, 1] and (lastStack [add, exec] [4, 3, 2, 1]) = SAerror . It easily follows from the definition of lastStack that if Q, S ⇒ Q , S then (lastStack Q S ) = (lastStack Q S ). Note that a stack returned by lastStack may be empty or have an empty command sequence at the top, so it may not be an element of FinalStack (defined in Figure 3.3 on page 53). The simplest notion of “stack equivalence” is that two stacks are equivalent if they are identical sequences of values. But this notion has problems similar to those discussed above with regard to outcomes in the context of observational equivalence. For example, suppose we are able to show that (1 add 2 add) and (3 add) are transform equivalent. Then we’d also like the transform equivalence of ((1 add 2 add)) and ((3 add)) to follow as a corollary. But given identical input stacks, these two sequences do not yield identical output stacks — the top values of the output stacks are different executable sequences! To finesse this problem, we need a notion of stack equivalence that treats two executable sequence elements as being the same if they are transform equivalent. The recursive nature of these notions prompts us to define four mutually recursive equivalence relations that formalize this approach: one between command sequences (transform equivalence), one between stack answers (stackanswer equivalence), one between stacks (stack equivalence), and one between stack elements (value equivalence).
94
Chapter 3 Operational Semantics
1. Command sequences Q1 and Q2 are said to be transform equivalent (written Q1 ∼Q Q2 ) if, for all stack-equivalent stacks S1 and S2 , it is the case that (lastStack Q1 S1 ) is stack-answer equivalent to (lastStack Q2 S2 ). 2. Stack answers SA1 and SA2 are said to be stack-answer equivalent (written SA1 ∼SA SA2 ) if • both SA1 and SA2 are the distinguished error stack, SAerror ; or • SA1 = (Stack StackAnswer S1 ), SA2 = (Stack StackAnswer S2 ), and S1 is stack equivalent to S2 . 3. Stacks S1 and S2 are stack equivalent (written S1 ∼S S2 ) if they are equallength sequences of values that are elementwise value equivalent. I.e., S1 = [V1 , . . . , Vn ], S2 = [V1 , . . . , Vn ], and Vi ∼V Vi for all i such that 1 ≤ i ≤ n. Equivalently, S1 and S2 are stack equivalent if • both S1 and S2 are the empty stack; or • S1 = V1 . S1 , S2 = V2 . S2 , V1 ∼V V2 , and S1 ∼S S2 . 4. Stack elements V1 and V2 are value equivalent (written V1 ∼V V2 ) if V1 and V2 are the same integer numeral (i.e., V1 = N = V2 ) or if V1 and V2 are executable sequences whose contents are transform equivalent (i.e., V1 = (Q1 ), V2 = (Q2 ), and Q1 ∼Q Q2 ). Despite the mutually recursive nature of these definitions, we claim that all four are well-defined equivalence relations as long as we choose the largest relations satisfying the descriptions. Two PostFix command sequences can be proved transform equivalent by case analysis on the structure of input stacks. This is much easier than the case analysis on the structure of contexts that is implied by observational equivalence. Since (as we shall show below) observational equivalence follows from transform equivalence, transform equivalence is a practical technique for demonstrating observational equivalence. As a simple example of transform equivalence, we show that [1, add, 2, add] ∼Q [3, add]. Consider two stacks S1 and S2 such that S1 ∼S S2 . We proceed by case analysis on the structure of the stacks: 1. S1 and S2 are both [ ], in which case (lastStack [3, add] [ ]) = (lastStack [add] [3]) = SAerror = (lastStack [add, 2, add] [1]) = (lastStack [1, add, 2, add] [ ])
3.7.2 Transform Equivalence
95
2. S1 and S2 are nonempty sequences whose heads are the same numeric literal and whose tails are stack equivalent. I.e., S1 = N . S1 , S2 = N . S2 , and S1 ∼S S2 . We use the abbreviation N1 +N2 for (calculate + N1 N2 ). (lastStack [3, add] N . S1 ) = (lastStack [add] 3 . N .S1 ) = lastStack [ ] N +3 . S1 = (Stack StackAnswer N +3 . S1 ) ∼SA (Stack StackAnswer N+3 . S2 ) = lastStack [ ] N +3 . S2 = lastStack [add] 2 . N +1 . S2 = lastStack [2, add] N +1 . S2 = (lastStack [add, 2, add] 1 . N . S2 ) = (lastStack [1, add, 2, add] N . S2 )
3. S1 and S2 are nonempty sequences whose heads are transform-equivalent executable sequences and whose tails are stack equivalent. I.e., S1 = Q1 . S1 , S2 = Q2 . S2 , Q1 ∼Q Q2 , and S1 ∼S S2 . (lastStack [3, add] Q1 . S1 ) = (lastStack [add] 3 . Q1 . S1 ) = SAerror = (lastStack [add, 2, add] 1 . Q2 . S2 ) = (lastStack [1, add, 2, add] Q2 . S2 )
In all three cases, (lastStack [1, add, 2, add] S1 ) ∼SA (lastStack [3, add] S2 )
so the transform equivalence of the sequences follows by definition of ∼Q . We emphasize that stacks can be equivalent without being identical. For instance, given the result of the above example, it is easy to construct two stacks that are stack equivalent but not identical: [(1 add 2 add), 5] ∼S [(3 add), 5]
Intuitively, these stacks are equivalent because they cannot be distinguished by any PostFix command sequence. Any such sequence must either ignore both sequence elements (e.g., [pop]), attempt an illegal operation on both sequence elements (e.g., [mul]), or execute both sequence elements on equivalent stacks (via exec). But because the sequence elements are transform equivalent, executing them cannot distinguish them.
96
Chapter 3 Operational Semantics
3.7.3
Transform Equivalence Implies Observational Equivalence
We wrap up the discussion of observational equivalence by showing that transform equivalence of PostFix command sequences implies observational equivalence. This result is useful because it is generally easier to show that two command sequences are transform equivalent than to construct a proof based directly on the definition of observational equivalence. The fact that transform equivalence implies observational equivalence can be explained informally as follows. Every PostFix program context has a top-level command-sequence context with two parts: the commands performed before the hole and the commands performed after the hole. The commands before the hole transform the initial stack into Spre . Suppose the hole is filled by one of two executable sequences, Q1 and Q2 , that are transform equivalent. Then the stacks Spost1 and Spost2 that result from executing these sequences, respectively, on Spre must be stack equivalent. The commands performed after the hole must transform Spost1 and Spost2 into stack-equivalent stacks Sfinal1 and Sfinal2 . Since behavior depends only on the equivalence class of the final stack, it is impossible to construct a context that distinguishes Q1 and Q2 . Therefore, they are observationally equivalent. We will need the following lemma for the formal argument: Lemma 3.5 For any command-sequence context Q, Q1 ∼Q Q2 implies Q{Q1 } ∼Q Q{Q2 }. Proof of Lemma 3.5: We will employ the following properties of transform equivalence, which are left as exercises for the reader: Q1 ∼Q Q1 and Q2 ∼Q Q2 Q1 ∼Q Q2
implies implies
Q1 @ Q2 ∼Q Q1 @ Q2 [(Q1 )] ∼Q [(Q2 )]
(3.10) (3.11)
Property 3.11 is tricky to read; it says that if Q1 and Q2 are transform equivalent, then the singleton command sequences containing the exectuable sequences made up of the commands of Q1 and Q2 are also transform equivalent. We proceed by structural induction on the grammar of the PostfixSequenceContext domain (Figure 3.19 on page 91): • (Base case) For sequence contexts of the form 2, Q1 ∼Q Q2 trivially implies 2{Q1 } ∼Q 2{Q2 }.
3.7.3 Transform Equivalence Implies Observational Equivalence
97
• (Induction cases) For each of the compound sequence contexts — Q @ Q, Q @ Q, [(Q)] — assume that Q1 ∼Q Q2 implies Q{Q1 } ∼Q Q{Q2 } for any Q. • For sequence contexts of the form Q @ Q, Q1 ∼Q Q2 implies Q{Q1 } ∼Q Q{Q2 } implies Q @ (Q{Q1 }) ∼Q Q @ (Q{Q2 }) implies (Q @ Q){Q1 } ∼Q (Q @ Q){Q2 }
by assumption by reflexivity of ∼Q and 3.10 by definition of Q
• Sequence contexts of the form Q @ Q are handled similarly to those of the form Q @ Q. • For sequence contexts of the form [(Q)], Q1 ∼Q Q2 implies Q{Q1 } ∼Q Q{Q2 } implies [(Q{Q1 })] ∼Q [(Q{Q2 })] implies [(Q)]{Q1 } ∼Q [(Q)]{Q2 }
by assumption by 3.11 by definition of Q
3
Now we are ready to present a formal proof that transform equivalence implies observational equivalence. Theorem 3.6 (PostFix Transform Equivalence) Q1 ∼Q Q2 implies Q1 =obs Q2 . Proof of Theorem 3.6: Assume that Q1 ∼Q Q2 . By the definition of Q1 =obs Q2 , we need to show that for any PostFix program context of the ∗ form (postfix Nn Q) and any integer numeral argument sequence Nargs ∗ ∗ beh det (postfix Nn Q{Q1 }), Nargs = beh det (postfix Nn Q{Q2 }), Nargs
Here we use beh det (defined for a generic SOS on page 51) because we know that PostFix has a deterministic behavior function. By Lemma 3.5, Q1 ∼Q Q2 implies Q{Q1 } ∼Q Q{Q2 }. Let Sinit be a stack ∗ . Then by the definition of ∼ , we have consisting of the elements of Nargs Q (lastStack Q{Q1 } Sinit ) ∼SA (lastStack Q{Q2 } Sinit )
By the definition of lastStack and ∼SA , there are two cases: ∗
∗
1. Q{Q1 }, Sinit ⇒ cf1 and Q{Q2 }, Sinit ⇒ cf2 , where both cf1 and cf2 are irreducible PostFix configurations with a nonempty command sequence component. In this case, both executions are stuck, so ∗ beh det (postfix Nn Q{Q1 }), Nargs = stuck ∗ = beh det (postfix Nn Q{Q2 }), Nargs
98
Chapter 3 Operational Semantics ∗
∗
2. Q{Q1 }, Sinit ⇒ [ ], S1 , Q{Q2 }, Sinit ⇒ [ ], S2 , and S1 ∼S S2 . In this case, there are two subcases: (a) S1 and S2 are both nonempty stacks with the same integer numeral N on top. In this subcase, ∗ beh det (postfix Nn Q{Q1 }), Nargs = (IntLit Outcome N ) ∗ = beh det (postfix Nn Q{Q2 }), Nargs
(b) S1 and S2 either (1) are both the empty stack or (2) are both nonempty stacks with executable sequences on top. In this subcase, ∗ beh det (postfix Nn Q{Q1 }), Nargs = stuck ∗ = beh det (postfix Nn Q{Q2 }), Nargs
3
Exercise 3.28 For each of the following purported observational equivalences, either prove that the observational equivalence is valid (via transform equivalence), or give a counterexample to show that it is not. a. [N , pop] =obs [ ] b. [add, N , add] =obs [N , add, add] c. [N1 , N2 , A] =obs [Nans ], where Nans = (calculate A N1 N2 ) d. [(Q), exec] =obs Q e. [(Q), (Q), sel, exec] =obs pop . Q f. [N1 , (N2 (Qa ) (Qb ) sel exec), (N2 (Qc ) (Qd ) sel exec), sel, exec] =obs [N2 , (N1 (Qa ) (Qc ) sel exec), (N1 (Qb ) (Qd ) sel exec), sel, exec] g. [C1 , C2 , swap] =obs [C2 , C1 ] h. [swap, swap, swap] =obs [swap] Exercise 3.29 Prove Lemmas 3.10 and 3.11, which are used to show that transform equivalence implies operational equivalence. Exercise 3.30 Transform equivalence (∼Q ) is defined in terms of lastStack, where lastStack is defined on page 93. Below we consider two alternative definitions of lastStack. lastStack1 : CommandSeq 8 → Stack → StackAnswer ∗ < (Stack StackAnswer S ) if Q, S ⇒ [ ], S (lastStack1 Q S ) = and S ∈ FinalStack : SAerror otherwise lastStack2 : CommandSeq → Stack → StackAnswer ∗ (lastStack2 Q S ) = (Stack StackAnswer S ) if Q, S ⇒ Q , S ⇒ (Recall that cf ⇒ means that configuration cf is irreducible.)
3.7.3 Transform Equivalence Implies Observational Equivalence
99
a. Give an example of two sequences that are transform equivalent using the original definition of lastStack but not using lastStack1 . b. Show that property (3.10) does not hold if transform equivalence is defined using lastStack2 . Exercise 3.31 a. Modify the PostFix semantics in Figure 3.3 so that the outcome of a PostFix program whose final configuration has an executable sequence at the top is the token executable. b. In your modified semantics, show that transform equivalence still implies observational equivalence. Exercise 3.32 Prove the following composition theorem for observationally equivalent PostFix sequences: Q1 =obs Q1 and Q2 =obs Q2 implies Q1 @ Q2 =obs Q1 @ Q2
Exercise 3.33 Which of the following transformations on EL numerical expressions are safe? Explain your answers. Be sure to consider stuck expressions like (/ 1 0). simp − −− →3 a. (+ 1 2) − simp b. (+ 0 NE ) − − −− → NE simp c. (* 0 NE ) − − −− →0 simp d. (+ 1 (+ 2 NE )) − − −− → (+ 3 NE ) simp e. (+ NE NE ) − − −− → (* 2 NE ) simp f. (if (= N N ) NE 1 NE 2 ) − − −− → NE 1 simp g. (if (= NE 1 NE 1 ) NE 2 NE 3 ) − − −− → NE 2 simp h. (if BE NE NE ) − − −− → NE
Exercise 3.34 Develop a notion of transform equivalence for EL that is powerful enough to formally prove that the transformations in Exercise 3.33 that you think are safe are really safe. You will need to design appropriate contexts for EL programs, numerical expressions, and boolean expressions. Exercise 3.35 Given that transform equivalence implies observational equivalence in PostFix, it is natural to wonder whether the converse is true. That is, does the following implication hold? Q1 =obs Q2 implies Q1 ∼Q Q2
If so, prove it; if not, explain why.
100
Chapter 3 Operational Semantics
Exercise 3.36 Consider the following TP function, which translates an ELMM program to a PostFix program: TP : ProgELMM → ProgPostFix TP [[(elmm NE body )]] = (postfix 0 TN E [[NE body ]]) TN E : NumExp → CommandSeq TN E [[N ]] = [N ] TN E [[(A NE 1 NE 2 )]] = TN E [[NE 1 ]] @ TN E [[NE 2 ]] @ [TA [[A]]] TA : ArithmeticOperatorELMM → ArithmeticOperatorPostFix TA [[+]] = add TA [[-]] = sub, etc.
a. What is TP [[(elmm (/ (+ 25 75) (* (- 7 4) (+ 5 6))))]]? b. Intuitively, TP maps an ELMM program to a PostFix program with the same behavior. Develop a proof that formalizes this intuition. As part of your proof, show that the following diagram commutes: ELM M CELM M1
CELM M2
TN E
TN E P ostF ix
CP ostF ix1
CP ostF ix2
The nodes CELM M1 and CELM M2 represent ELMM configurations, and the nodes CP ostF ix1 and CP ostF ix2 represent PostFix configurations of the form introduced in Exercise 3.12 on page 70. The horizontal arrows are transitions in the respective systems, while the vertical arrows are applications of TN E . It may help to think in terms of a context-based semantics. c. Extend the translator to translate (1) ELM programs and (2) EL programs. In each case, prove that the program resulting from your translation has the same behavior as the original program.
3.8
Extending PostFix
We close this chapter on operational semantics by illustrating that slight perturbations to a language can have extensive repercussions for the properties of the language. You have probably noticed that PostFix has a very limited expressive power. The fact that all programs terminate gives us a hint why. Any language in which all programs terminate can’t be universal, because any universal language must allow nonterminating computations to be expressed. Even if we don’t care about
3.8 Extending PostFix
101
universality (maybe we just want a good calculator language), PostFix suffers from numerous drawbacks. For example, nget allows us to “name” numerals by their position relative to the top of the stack, but these positions change as values are pushed and popped, leading to programs that are challenging to read and write. It would be preferable to give unchanging names to values. Furthermore, nget accesses only numerals, and there are situations where we need to access executable sequences and use them more than once. We could address these problems by allowing executable sequences to be copied from any position on the stack and by introducing a general way to name any value; these extensions are explored in exercises. For now, we will consider extending PostFix with a command that just copies the top value on a stack. Since the top value might be an executable sequence, this at least gives us a way to copy executable sequences — something we could not do before. Consider a new command, dup, which duplicates the value at the top of the stack. After execution of this command, the top two values of the stack will be the same. The rewrite rule for dup is given below: dup . Q, V . S ⇒ Q, V . V . S
[dup]
As a simple example of using dup, consider the executable sequence (dup mul), which behaves as a squaring subroutine: [12]
(postfix 1 (dup mul) exec) − −− → 144 (postfix 2 (dup mul) dup 3 nget swap exec swap 4 nget swap exec add) [5,12] −−− → 169 −
The introduction of dup clearly enhances the expressive power of PostFix. But adding this innocent little command has a tremendous consequence for the language: it destroys the termination property! Consider the program (postfix 0 (dup exec) dup exec). Executing this program on zero arguments yields the following transition sequence: ((dup exec) dup exec), [ ] ⇒ (dup exec), [(dup exec)] ⇒ (exec), [(dup exec), (dup exec)] ⇒ (dup exec), [(dup exec)] ⇒ ...
Because the rewrite process returns to a previously visited configuration, it is clear that the execution of this program never terminates. It is not difficult to see why dup invalidates the termination proof from Section 3.6. The problem is that dup can increase the energy of a configuration in
102
Chapter 3 Operational Semantics
the case where the top element of the stack is an executable sequence. Because dup effectively creates new commands in this situation, the number of commands executed can be unbounded. It turns out that extending PostFix with dup not only invalidates the termination property, but also results in a language that is universal!9 (See Exercise 3.48 on page 112.) That is, any computable function can be expressed in PostFix+{dup}. This simple example underscores that minor changes to a language can have major consequences. Without careful thought, it is never safe to assume that adding or removing a simple feature or tweaking a rewrite rule will change a language in only minor ways. We conclude this chapter with numerous exercises that explore various extensions to the PostFix language. Exercise 3.37 Extend the PostFix SOS so that it handles the following commands: pair: Let v1 be the top value on the stack and v2 be the next-to-top value. Pop both values off the stack and push onto the stack a pair object v2 , v1 . fst: If the top stack value is a pair vfst , vsnd , then replace it with vfst (the first value in the pair). Otherwise signal an error. snd: If the top stack value is a pair vfst , vsnd , then replace it with vsnd (the second value in the pair). Otherwise signal an error. Exercise 3.38 Extend the PostFix SOS so that it handles the following commands: get: Call the top stack value vindex and the remaining stack values (from top down) v1 , v2 , . . ., vn . Pop vindex off the stack. If vindex is a numeral i such that 1 ≤ i ≤ n, push vi onto the stack. Signal an error if the stack does not contain at least one value, if vindex is not a numeral, or if i is not in the range [1..n]. (get is like nget except that it can copy any value, not just a numeral.) put: Call the top stack value vindex , the next-to-top stack value vval , the remaining stack values (from top down) v1 , v2 , . . ., vn . Pop vindex and vval off the stack. If vindex is a numeral i such that 1 ≤ i ≤ n, change the slot holding vi on the stack to hold vval . Signal an error if the stack does not contain at least two values, if vindex is not a numeral, or if i is not in the range [1..n]. Exercise 3.39 Write the following programs in PostFix+{dup}. You may also use the pair commands from Exercise 3.37 and/or the get/put commands from Exercise 3.38 in 9 We are indebted to Carl Witty and Michael Frank for showing us that PostFix+{dup} is universal.
3.8 Extending PostFix
103
your solution, but they are not necessary — for an extra challenge, program purely in PostFix+{dup}. a. A program that takes a single argument (call it n) and returns the factorial of n. The factorial function f of an integer is defined so that (f 0) = 1 and (f n) = (n ×Int (f (n −Int 1))) for n ≥ 1. b. A program that takes a single argument (call it n) and returns the nth Fibonacci number. The Fibonacci function f of an integer is defined so that (f 0) = 0, (f 1) = 1, and (f n) = ((f (n −Int 1)) +Int (f (n −Int 2))) for n ≥ 2. Exercise 3.40 Abby Stracksen wishes to extend PostFix with a simple means of iteration. She suggests adding a command of the form (for N (Q)). Abby describes the behavior of her command with the following rewrite axioms: (for N (Qfor )) . Qrest , S ⇒ N . Qfor @ [(for Ndec (Qfor ))] @ Qrest , S where (Ndec = (calculate sub N 1)) ∧ (compare gt N 0) (for N (Qfor )) . Qrest , S ⇒ Qrest , S where ¬(compare gt N 0)
[for-once]
[for-done]
Abby calls her extended language PostLoop. a. Give an informal specification of Abby’s for command that would be appropriate for a reference manual. b. Using Abby’s for semantics, what are the results of executing the following PostLoop programs when called on zero arguments? i.
(postloop 0 1 (for 5 (mul)))
ii.
(postloop 0 1 (for 5 (2 mul)))
iii.
(postloop 0 1 (for 5 (add mul)))
iv.
(postloop 0 0 (for 17 (pop 2 add)))
v.
(postloop 0 0 (for 6 (pop (for 7 (pop 1 add)))))
c. Extending PostFix with the for command does not change its termination property. Show this by extending the termination proof described in Section 3.6.2 in the following way: i.
Define the energy of the for command.
ii.
Show that the transitions in the [for-once] and [for-done] rules decrease configuration energy.
d. Bud Lojack has developed a repeat command of the form (repeat N (Q)) that is similar to Abby’s for command. Bud defines the semantics of his command by the following rewrite rules:
104
Chapter 3 Operational Semantics (repeat N (Qrpt )) . Qrest , S ⇒ N . (repeat Ndec (Qrpt )) . Qrpt @ Qrest , S where (Ndec = (calculate sub N 1)) ∧ (compare gt N 0) (repeat N (Qrpt )) . Qrest , S ⇒ Qrest , S where ¬(compare gt N 0)
[repeat-once]
[repeat-done]
Does Bud’s repeat command have the same behavior as Abby’s for command? That is, does the following observational equivalence hold? [(repeat N (Q))] =obs [(for N (Q))]
Justify your answer. Exercise 3.41 Alyssa P. Hacker has created PostSafe, an extension to PostFix with a new command called sdup: safe dup. The sdup command is a restricted form of dup that does not violate the termination property of PostFix. The informal semantics for sdup is as follows: if the top of the stack is a number or a command sequence that doesn’t contain sdup, duplicate it; otherwise, signal an error. As a new graduate student in Alyssa’s ARGH (Advanced Research Group for Hacking), you are assigned to give an operational semantics for sdup, and a proof that all PostSafe programs terminate. Alyssa set up several intermediate steps to make your life easier. a. Write the operational semantics rules that describe the behavior of sdup. Model the errors through stuck states. You can use the auxiliary function contains sdup : CommandSeq → Bool
that takes a sequence of commands and checks whether it contains sdup or not. b. Consider the product domain P = N × N (recall that N is the set of natural numbers, starting with 0). On this domain, Alyssa defined the ordering
∀ni=0 . (Ti ≈ Ti ) n (T i=1 ) F T0 ) ≈ (->
F ≈e F (T ni=1 ) F T0 )
All other type-equivalence rules for FLARE/E are straightforward. Figure 16.1
FLARE/E types, effects, and regions.
[→-≈]
950
Chapter 16
Effects Describe Program Behavior
The dup! procedure takes a cell c containing a list, modifies it to contain a list with the first element duplicated, and returns the new list. For maximum utility, the dup! procedure should have a polymorphic type that abstracts over (1) the type ?t of the elements in the list in the cell c and (2) the region ?r of the cell c. Here is a type schema for dup! with the desired degree of polymorphism: (generic (?t ?r) (-> ((cellof (listof ?t) ?r)) (maxeff (read ?r) (write ?r)) (listof ?t)))
{?t is type of list elements} {?r is region of cell} {type of the argument c} {latent effect of dup!} {type of result of dup!}
The apply-twice procedure is polymorphic in the input type of f, the output type of f, and the latent effect of f: {?t1 is input type of f} {?t2 is output type of f} {?e is latent effect of f} (-> ((-> (?t1) ?e ?t2) {type of f} ?t1) {type of x} ?e {latent effect of apply-twice, inherited from f} ?t2)) {type of result of apply-twice}
(generic (?t1 ?t2 ?e)
In this case, the latent effect ?e of the argument procedure f is inherited by apply-twice. If we assume that the bools cell is allocated in region r4 and the ints cell is allocated in region r5, then we have the following instantiations for the generic-bound variables in the two applications of apply-twice: Variable ?t ?r ?t1 ?t2 ?e
(apply-twice dup! bools) bool r4 (cellof (listof bool) r4) (listof bool) (maxeff (read r4) (write r4))
(apply-twice dup! ints) int r5 (cellof (listof int) r5) (listof int) (maxeff (read r5) (write r5))
So the type and effect of Etwice are: Etwice : (pairof (listof bool) (listof int)) ! (maxeff (init r4) (read r4) (write r4) (init r5) (read r5) (write r5))
Types, effects, and regions together are descriptions — they describe program expressions. We saw descriptions earlier, in Section 12.3.1, where they were used to specify the structure of type constructors. Here, effects and regions are new kinds of descriptions for describing program behavior. In FLARE/E, a de-
16.2.2 Type and Effect Rules
951
scription identifier δ can name any description and supersedes FLARE’s type identifier τ , which can name only types. This allows us to treat descriptions uniformly in type schemas (as illustrated above) and allows us to define notations for substitution and unification uniformly with types, effects, and regions. This uniformity simplifies our presentation, but can lead to ill-formed descriptions (e.g., a type appearing in a position where an effect is expected, or vice versa). Such ill-formed descriptions can be avoided by using a simple kind system as discussed in Section 12.3.2 (see Exercise 16.3).
16.2.2
Type and Effect Rules
An effect system is a set of rules for assigning effects to program expressions. Figures 16.2 and 16.3 present a type and effect system that assigns both a type and an effect to every FLARE/E expression. The system is based on type/effect judgments of the form TE E : T ! F
This is pronounced “expression E has type T and effect F in type environment TE .” As in FLARE, the type environments in this system map identifiers to type schemas. The type/effect rules in Figure 16.3 are similar to the type rules for the full FLARE language presented in Figures 13.3 (page 775), 13.19 (page 805), and 13.24 (page 815), except that they determine the effects of expressions in addition to their types. Literals, variable references, errors, and abstractions are all pure because their evaluation does not touch the store and so can have no store effect. Variable references would not be pure if FLARE/E included mutable variables (set!). The [genvar] rule allows substitution of arbitrary descriptions (types, effects, regions) for the generic-bound description variables in a type schema. Substituting the wrong kind of description (e.g., substituting a type for a description variable used as an effect) would lead to an ill-formed type expression. But this is not problematic, because descriptions in the [genvar] rule must be “guessed” correctly to show that an expression is well typed. The formal parameters of generic can be annotated with kind information to guarantee that all types resulting from this substitution are well formed (see Exercise 16.3). The rules for all compound expressions (except abstractions) use maxeff to combine the effects of all subexpressions and include them in the effect of the whole expression. The [→-intro] and [→-elim] rules communicate effect information from the point of procedure definition to the point of procedure application. The [→-intro] rule includes the effect of an abstraction’s body as the latent ef-
952
Chapter 16
Effects Describe Program Behavior
Domains TE ∈ TypeEnvironment = Ident TypeSchema Other type and effect domains are defined in Figure 16.1. Type Functions egen : Type → TypeEnvironment → TypeSchema (egen T TE ) = (generic (δ ni=1 ) T ), where {δ1 , . . . , δn } = FrDescIds ty [[T ]] − (FrDescIds tyenv TE ) egenPureSP : Type → TypeEnvironment → Exp → TypeSchema ⎧ ⎨(egen Tdefn TE ) if pure E , where pure is defined in Figure 13.25 on page 816 (egenPureSP Tdefn TE E ) = ⎩ Tdefn otherwise Figure 16.2
Type/effect rules for FLARE/E, Part 1.
fect in the procedure type of the abstraction, and the [→-elim] rule includes the latent effect of a procedure type in the effect of a procedure application. Latent effects are also propagated by the [prim] rule to handle the fact that the types of cell operators must now carry nontrivial latent effects. Operator types in the primitive type environment TE prim must now carry latent effects, which are pure except for the cell operators. For example: cell : (generic ^ : (generic := : (generic + : (-> (int cons : (generic
(?t ?r) (-> (?t) (init ?r) (cellof ?t ?r))) (?t ?r) (-> ((cellof ?t ?r)) (read ?r) ?t)) (?t ?r) (-> ((cellof ?t ?r) ?t) (write ?r) unit)) int) pure int) (?t) (-> (?t (listof ?t)) pure (listof ?t)))
The [letSP ] and [letrecSP ] rules for FLARE/E are similar to the [letLP ] and [letrecLP ] rules for FLARE (Figure 13.24 on page 815). One difference is that egenPureSP is defined in terms of egen, which generalizes over all free description variables (not just type variables) in the type T that do not appear in the type environment TE . We assume that the FrDescIds ty function returns the free type, effect, and region variables in a type and the FrDescIds tyenv function returns all of the free type, effect, and region variables in a type environment. The definitions of these functions are left as an exercise (Exercise 16.2). The SP subscript, which stands for “syntactic purity,” emphasizes that these rules and functions use the same syntactic test for expression purity that is used in FLARE. This seems crazy — why not use the effect system itself to deter-
16.2.2 Type and Effect Rules
953
Type/Effect Rules TE #u : unit ! pure [unit] TE N : int ! pure [int] TE B : bool ! pure [bool] TE (sym Y ) : symb ! pure [symb]
TE (error Y ) : T ! pure [error]
TE I : T ! pure where TE (I ) = T
[var]
TE I : ([Di /δi ]ni=1 )Tbody ! pure where TE (I ) = (generic (δ ni=1 ) Tbody ) [genvar] TE Ethen : T ! Fthen TE Eelse : T ! Felse TE Etest : bool ! Ftest TE (if Etest Ethen Eelse ) : T ! (maxeff Ftest Fthen Felse )
[if ]
n
TE [Ii : Ti ]i=1 Ebody : Tbody ! Fbody n TE (abs (I i=1 ) Ebody ) : (-> (T ni=1 ) Fbody Tbody ) ! pure
[→-intro]
n TE E0 : (-> (T i=1 ) Flatent Tres ) ! F0 ∀ni=1 . (TE Ei : Ti ! Fi ) n TE (E0 E i=1 ) : Tres ! (maxeff Flatent F ni=0 )
[→-elim]
TE prim O : (-> (T ni=1 ) Flatent Tres ) ! pure ∀ni=1 . (TE Ei : Ti ! Fi ) [prim] TE (prim O E ni=1 ) : Tres ! (maxeff Flatent F ni=1 ) ∀ni=1 . (TE Ei : Ti ! Fi ) n TE [Ii : (egenPureSP Ti TE Ei )]i=1 E0 : T0 ! F0 n TE (let ((Ii Ei )i=1 ) E0 ) : T0 ! (maxeff F ni=0 )
n ∀ni=1 . TE [Ij : Tj ]j=1 Ei : Ti ! Fi n TE [Ii : (egenPureSP Ti TE Ei )]i=1 E0 : T0 ! F0 TE (letrec ((Ii Ei )ni=1 ) E0 ) : T0 ! (maxeff F ni=0 ) TE E : T ! F , where F e F TE E : T ! F
[letSP ]
[letrecSP ]
[does]
n
prog Figure 16.3
{Ii : Ti }i=1 Ebody : Tbody ! Fbody n (flarek (I i=1 ) Ebody ) : (-> (T ni=1 ) Fbody Tbody ) ! pure
[prog]
Type/effect rules for FLARE/E, Part 2.
mine purity? The reason is that an effect-based test for purity complicates the reconstruction of types and effects and the relationship between FLARE/E and FLARE. This is explored in more detail in Section 16.2.5. Because our use of effect equivalence and type equivalence in FLARE/E type derivations is implicit, the type and effect system does not include an explicit type
954
Chapter 16
Effects Describe Program Behavior
rule for type equivalence (e.g., the [type-≈] rule in Figure 11.20 on page 680). For example, consider the following FLARE/E type/effect derivation: .. . TE E1 : (-> (int) (maxeff (read r1) (write r2)) bool) ! pure .. . TE E2 : int ! (maxeff (read r1) (read r2)) TE (E1 E2 ) : bool ! (maxeff (read r1) (read r2) (write r2)) [→-elim]
This is valid because the effect (maxeff (maxeff (read r1) (write r2)) pure (maxeff (read r1) (read r2)))
specified by the [→-elim] rule can be simplified to the following effect using implicit effect equivalence: (maxeff (read r1) (read r2) (write r2))
The effect of an expression determined by our type and effect system is a conservative approximation of the actions performed by the expression at run time. Combining the effects of subexpressions with maxeff can lead to effects that overestimate the actual actions performed. For example, suppose that Epure has effect pure, Eread has effect (read r6), and Ewrite has effect (write r6). Then (if #t Epure Eread ) has effect (read r6) even though it does not touch the store at run time, and the conditional (if b Eread Ewrite ) has effect (maxeff (read r6) (write r6)) even though only one of its branches is taken at run time. It is possible to inflate the effect of an expression via the [does] rule, which allows an expression with effect F to be given effect F as long as F is a subeffect of F . In order to derive a type and effect for an expression, it is sometimes necessary to use the [does] rule to get the latent effects embedded in two procedure types to be the same. Consider the expression Eifproc = (if b (abs () Eread ) (abs () Ewrite ))
relative to a type environment TE in which b has type bool, Eread has type T and effect (read r6), and Ewrite has type T and effect (write r6). Without using the [does] rule, we can show: TE (abs () Eread ) : (-> () (read r6) T ) ! pure TE (abs () Ewrite ) : (-> () (write r6) T ) ! pure
16.2.2 Type and Effect Rules
955
The [if ] rule requires that the types of the two branch expressions be the same, but in this case the procedure types are not the same because their effects differ. To show that Eifproc is well typed, it is necessary to use the [does] rule to give the effect Frw = (maxeff (read r6) (write r6)) to the bodies of both procedures, resulting in the following derivation: TE b : bool ! pure [var] .. . TE Eread : T ! (read r6) TE Eread : T ! Frw [does] TE (abs () Eread ) : (-> () Frw T ) ! pure [→-intro] .. . TE Ewrite : T ! (write r6) TE Ewrite : T ! Frw [does] TE (abs () Ewrite ) : (-> () Frw T ) ! pure [→-intro] TE (if b (abs () Eread ) (abs () Ewrite )) : (-> () Frw T ) ! pure [if ]
In the above example, the [does] rule is used to artificially inflate the effects of procedure bodies before forming procedure types so that the procedure types (and, specifically, their latent effect components) will be identical elsewhere in the type derivation. This is the key way in which the [does] rule is used in practice. The FLARE/E type system does not support any form of subtyping, so that there is no direct way to show that a procedure with type (-> (int) (maxeff (read r) (write r)) int) can be used in place of one with type (-> (int) (maxeff (init r) (read r) (write r)) int). However, as illustrated above, the [does] rule can be used in conjunction with the [→-intro] rule to inflate the base effects in the latent effect of a procedure when it is created. The [does] rule permits an expression to take on many possible effects. We shall see below (on page 962 in Section 16.2.3) that there is a well-defined, indeed practically computable, notion of least effect. Henceforth, when we refer to the effect of an expression, we mean the smallest effect that can be proven by our rules. When we discuss effect reconstruction (Section 16.2.3), we will show how to automatically calculate the smallest effect allowed by the rules. How is the FLARE/E type and effect system related to the FLARE type system studied earlier? It has exactly the same typing power as FLARE — a
956
Chapter 16
Effects Describe Program Behavior
program is typable in FLARE if and only if it is typable in FLARE/E. This relationship is a consequence of the following theorem, which uses the notations T eT and TE eT E (see Exercise 16.4) to stand for the result of erasing effect and region information from the FLARE/E type T and FLARE/E type environment TE : Theorem 16.1 TE E : T in the FLARE type system if and only if there exists a FLARE/E type environment TE , a FLARE/E type T , and an effect F such that TE eT E = TE , T eT = T , and TE E : T ! F in the FLARE/E type/effect system. Proving that TE E : T ! F implies TE E : T is easily done by showing that erasing all effect information in the FLARE/E type/effect derivation yields a FLARE type derivation (see Exercise 16.5). The other direction (TE E : T implies TE E : T ! F ) is proven by showing that the judgments and procedure types in a FLARE type derivation can always be extended with effect information to yield a FLARE/E type/effect derivation (Exercise 16.6). Exercise 16.1 Consider the following program: (flarek (b) (let ((c (prim cell 2))) (let ((one (abs (x) 1)) (get (abs (y) (prim ^ y))) (setc! (abs (z) (let ((_ (prim := c z))) z)))) ((abs (appc) ((if b setc! one) (prim + (appc get) (appc one)))) (abs (f) (f c))))))
a. Give a type derivation showing that the above program is well typed in the FLARE/E type system. You will need to use the [does] rule to inflate some latent effects in procedure types, but use the minimal latent effect possible. b. How would your answer to part a change if the subexpression ((abs (appc) . . . ) (abs (f) (f c)))
were changed to (let ((appc (abs (f) (f c)))) . . . )?
Exercise 16.2 Define the following functions for determining the free description identifiers of various domains: FrDescIds reg : Region → P(DescId) FrDescIds eff : Effect → P(DescId) FrDescIds ty : Type → P(DescId) FrDescIds tysch : TypeSchema → P(DescId) FrDescIds tyenv : TypeEnvironment → P(DescId)
16.2.2 Type and Effect Rules
957
Exercise 16.3 Intuitively, each description identifier that is a formal parameter in a FLARE/E generic expression denotes one of a type, an effect, or a region. For example, in the type schema (generic (?a ?b ?c) (-> ((-> (?a) ?b ?a) (cellof ?a ?c)) (maxeff ?b (read ?c) (write ?c)) ?a))
?a denotes a type, ?b denotes an effect, and ?c denotes a region. This intuition can be formalized using a simple kind system (cf. Section 12.3.2) based on the following domains: K ∈ Kind ::= type | effect | region DK ∈ DescIdKind ::= (δ K ) TS ∈ TypeSchema ::= T | (generic (DK ∗ ) T )
The TypeSchema domain has been changed so that every formal parameter declared by generic has an explicit kind. In the modified system, the example type schema above would be rewritten to have the form (generic ((?a type) (?b effect) (?c region)) . . . )
We say that a type schema with explicitly kinded parameters is well kinded if each reference to the parameter in the body of the type schema is consistent with its kind. For example, the type schema above (with explicit kinds) is well kinded. However, the schema (generic ((?d type) (?e region)) (-> (?d) pure (cellof ?e ?d)))
is not well kinded because region ?e is used as a type and the second occurrence of type ?d is used as region. a. Develop a formal deduction system for determining the well-kindedness of a type schema with explicitly kinded parameters. b. Define variants of each of the functions in Exercise 16.2 that return an element of P(DescIdKind) (in which each description identifier is paired with its kind) rather than an element of P(DescId). c. Modify the definition of the egen function in Figure 16.2 to use the functions from part b to return a type schema with explicitly kinded parameters. Under what simple conditions is the type schema guaranteed to be well kinded? Explain. d. Modify the [genvar] rule to guarantee that only descriptions of the appropriate kind are substituted for generic-bound description parameters in the body of the type schema. Argue that the type resulting from these substitutions is always well formed.
958
Chapter 16
Effects Describe Program Behavior
Exercise 16.4 Define the following effect-erasure functions for FLARE/E types, type schemas, and type environments: effectErasety : TypeFLARE /E → TypeFLARE effectErasetysch : TypeSchemaFLARE /E → TypeSchemaFLARE effectErasetyenv : TypeEnvironmentFLARE /E → TypeEnvironmentFLARE
The notations T eT , TS eT S , and TE eT E abbreviate (respectively) (effectErasety T ), (effectErasetysch TS ), and (effectErasetyenv TE ). Each function should erase all effect and region information from the FLARE/E entity to yield the FLARE entity. In the definition of effectErasetysch , it is helpful (but not absolutely necessary) to assume that it is possible to determine the kind of each generic parameter (see Exercise 16.3). Exercise 16.5 The notion of effect erasure from Exercise 16.4 can be extended to type/effect judgments and type/effect derivations in FLARE/E as follows: effectErasejudge : TypeJudgmentFLARE /E → TypeJudgmentFLARE The notation TJ eT J abbreviates (effectErasejudge TJ ). TE FLARE /E E : T ! F eT J = TE eT E FLARE E : T eT effectErasederiv : TypeDerivationFLARE /E → TypeDerivationFLARE The notation TDeT D abbreviates (effectErasederiv TD). & ’e TD [does] = TDeT D TE FLARE /E E : T ! F TD & ’e TD 1 eT D . . . TD n eT D TD 1 . . . TD n = , for all other type derivations. TJ eT J TJ TD
Prove that effectErasederiv is a well-defined function. That is, if TD is a FLARE/E type derivation, then TDeT D is a legal FLARE type derivation according to the type rules of FLARE. Your proof should be by induction on the structure of a type derivation TD and by case analysis on the type rule used in the root node of the type derivation tree. The well-definedness of effectErasederiv proves that TE FLARE /E E : T ! F implies TE eT E FLARE E : T eT in Theorem 16.1. Exercise 16.6 This exercise sketches a proof of the forward direction of Theorem 16.1 (i.e., that any FLARE type derivation can be annotated with appropriate effect information to yield a FLARE/E type/effect derivation) and asks you to work out the details. A simple approach is to assume that all cells are allocated in a single region (call it δreg ), in which case the maximal effect is Fmax = (maxeff (init δreg ) (read δreg ) (write δreg ))
Then any FLARE type derivation can be transformed to a FLARE/E type/effect derivation by:
16.2.3 Reconstructing Types and Effects: Algorithm Z
959
• changing every FLARE cell type (cellof T ) to the FLARE/E cell type (cellof T δreg ); n • changing every nonprimitive FLARE arrow type (-> (T i=1 ) T0 ) to the FLARE/E n arrow type (-> (T i=1 ) Fmax T0 );
• using the [does] rule to inflate the effect of every procedure body to Fmax before the [→-intro] rule is applied; and • introducing and propagating effects as required by the FLARE/E analogues of the FLARE type rules. a. Based on the above sketch, formally define a transformation T D : TypeDerivationFLARE → TypeDerivationFLARE /E
that transforms a valid FLARE type derivation for a well-typed expression into a valid FLARE/E type/effect derivation for the same expression. b. Suppose that TD is a FLARE type derivation for the type judgment TE E : T . Then (T D TD) is a FLARE/E type/effect derivation for the type/effect judgment TE E : T ! F . Show that TE eT E = TE and T eT = T . This completes the proof of Theorem 16.1.
16.2.3
Reconstructing Types and Effects: Algorithm Z
Effect-Constraint Sets We can adapt the FLARE type reconstruction algorithm (Algorithm R) from Section 13.3 to reconstruct effects as well as types. Recall that R has the signature R : Exp → TypeEnvironment → (Type × TypeConstraintSet)
and is expressed via deductive-style rules involving judgments of the form R[[E ]] TE = T , TCS
Elements TCS ∈ TypeConstraintSet are abstract sets of type-equality constraints that are collected and solved by the algorithm. The extended algorithm, which we call Algorithm Z (Figures 16.4, 16.6, 16.8, and 16.9), has the signature Z : Exp → TypeEnvironment → (Type × TypeConstraintSet × Effect × EffectConstraintSet)
and is expressed via deductive-style rules involving judgments of the form Z[[E ]] TE = T , TCS , F , FCS
960
Chapter 16
Effects Describe Program Behavior
Domains FC ∈ EffectConstraint = DescId × Effect ; (>= δ F ) stands for an element δ, F of EffectConstraint FCS ∈ EffectConstraintSet = FC ∗ ; Define dom(FCS ) = {δ | (>= δ F ) ∈ FCS } σ ∈ DescSubst = DescId Desc us ∈ UnifySoln = DescSubst + Failure Other type and effect domains are defined in Figure 16.1. Functions solveFCS : EffectConstraintSet → DescSubst = λFCS . fixDescSubst λσ . λδ FCS . (σ (maxeff δ F ni=1 )(>= δ Fi )∈FCS ) where ⊥DescSubst = λδ FCS . pure and λδ FCS . dbody stands for λδ . if δ ∈ dom(FCS ) then dbody else undefined end solveTCS : TypeConstraintSet → UnifySoln is defined as in Figure 13.12 on page 790, where it is assumed that unify is modified in a straightforward way to handle the unification of listof types, pairof types, cellof types, and -> types with latent effects (which are guaranteed to be effect variables). A successful unification now results in an element of DescSubst rather than TypeSubst because nontype description variables are encountered in the unification of cellof types (in which region variables are unified) and -> types (in which effect variables are unified). Figure 16.4 algorithm.
Domains and functions for the FLARE/E type and effect reconstruction
In addition to returning the type T and type-constraint set TCS of an expression E relative to a type environment TE , Algorithm Z returns: 1. The effect F of the expression. 2. A collection FCS ∈ EffectConstraintSet of effect inequality constraints having the form (>= δ F ). Such a constraint means that the effect F denoted by the effect variable δ must be at least as large as F — i.e., F "e F . What are the effect inequality constraints for? As mentioned in the discussion of the [does] rule beginning on page 954, the most challenging problem encountered when constructing a type/effect derivation in FLARE/E is guaranteeing that the latent effects of procedure types are identical in all situations where the rules require that two procedure types be the same. The purpose of the effect-
16.2.3 Reconstructing Types and Effects: Algorithm Z
961
constraint sets generated by Algorithm Z is to solve this problem. An effect constraint (>= δlat Fbody ) is generated by Algorithm Z when a derivation in the implicit type/effect system would use the [does] rule to inflate the effect of a procedure body from Fbody to δlat = Fbody in conjunction with an application of the [→-intro] rule: n
TE [Ii : Ti ]i=1 Ebody : Tbody ! Fbody TE [Ii : Ti ]i=1 Ebody : Tbody ! Fbody n
[does]
n ) Ebody ) : (-> (T ni=1 ) Fbody Tbody ) ! pure TE (abs (I i=1
[→-intro]
The extent to which Fbody needs to be inflated by the [does] rule depends on how the procedure type introduced by the [→-intro] rule flows through the rest of the type/effect derivation and is compared to other procedure types. Algorithm Z addresses this problem by introducing the description variable δlat to stand for Fbody and by generating an effect inequality constraint (>= δlat Fbody ) that must later be solved. The type and effect reconstruction system handles the above derivation pattern by a single application of the [→-introZ ] rule: n
Z[[Ebody ]] TE [Ii : δi ]i=1 = Tbody , TCS body , Fbody , FCS body n ) Ebody )]] TE = (-> (δ ni=1 ) δlat Tbody ), TCS body , Z[[(abs (I i=1 pure, (>= δlat Fbody ) . FCS body
[→-introZ ]
This is like the [→-introR ] type reconstruction rule for FLARE except that: (1) it introduces the description variables δ ni=1 for the parameters instead of type variables; (2) it specifies that abstractions have a pure effect; and (3) it adds the effect constraint (>= δlat Fbody ) to whatever effect constraints were generated in reconstructing the type and effect of Ebody . For a reason explained later, an effect-constraint set is concretely represented as a sequence of effect constraints. So (FC . FCS ) is the result of inserting the effect constraint FC into the effect-constraint set FCS , FCS 1 @ FCS 2 is the union of effect-constraint sets FCS 1 and FCS 2 , and @ni=1 FCS i is the union of the n effect-constraint sets FCS 1 , . . . , FCS n . We still use the set notation FC ∈ FCS to indicate that effect constraint FC is an element of the effect-constraint set FCS . We define dom(FCS ) as the set of effect variables δ appearing in constraints of the form (>= δ F ) within FCS . A solution to an effect-constraint set FCS is a substitution σ ∈ DescSubst = DescId Desc such that dom(σ) = dom(FCS ) and (σ δ) "e (σ F ) for every effect constraint (>= δ F ) in FCS . Although the formal signature of a solution
962
Chapter 16
Fi = (init r7) Fr = (read r7) Fw = (write r7) Fmax = (maxeff Fi Fr Fw ) FCS ex = [(>= δ1 pure), (>= δ2 (maxeff δ1 (>= δ3 (maxeff δ2 (>= δ4 (maxeff δ2 (>= δ4 δ5 ), (>= δ5 (maxeff δ3 Figure 16.5
Effects Describe Program Behavior
Fi )), Fr )), Fw )), δ4 ))]
FCS ex is an example of an effect-constraint set.
substitution is DescId Desc, the signature is really DescId Effect since all the description variables being solved denote effects.5 There are infinitely many solutions for any effect-constraint set. For example, consider the effect-constraint set FCS ex in Figure 16.5. Below are four solutions to FCS ex : σ σex1 σex2 σex3 σex4
(σ δ1 ) pure Fi Fr Fmax
(σ δ2 ) Fi Fi (maxeff Fi Fr ) Fmax
(σ δ3 ) (maxeff Fi Fr ) (maxeff Fi Fr ) (maxeff Fi Fr ) Fmax
(σ δ4 ) Fmax Fmax Fmax Fmax
(σ δ5 ) Fmax Fmax Fmax Fmax
There are also infinitely many solutions of the form σexF parameterized by F ∈ Effect that map every effect variable in FCS ex to (maxeff F Fmax ). Since the Effect domain is a pointed CPO (see Sections 5.2.2 and 5.2.3) under the e ordering, the domain DescId Effect is also a pointed CPO, and so there is a well-defined notion of a least solution to an effect-constraint set. The structure of the CPO and the existence of a least solution depend critically on the ACUI nature of effect combination via maxeff. The iterative approach to finding least fixed points from Section 5.2.5 can be used to calculate the least solution to an effect-constraint set FCS . We start with an approximation σ0 that maps each effect variable δ in dom(FCS ) to pure. For each step j, we define a better approximation σj that maps each δ in dom(FCS ) to an effect that combines (σj −1 δ) with (σj −1 Fi ) for each F such that (>= δ F ) is in FCS . (σj δ) is guaranteed to be at least as big as (σj −1 δ), and so in this 5
An effect constraint also typically contains description variables denoting regions, but these will not be in dom(σ) for a solution substitution σ.
16.2.3 Reconstructing Types and Effects: Algorithm Z
963
sense is a “better” approximation. Since there are finitely many effect variables δ ∈ dom(FCS ) and since (σj δ) always denotes some combination of the finite number of base effects mentioned in FCS , the iteration is guaranteed to converge to a fixed point in a finite number of steps. For example, this process finds the least solution to FCS ex in three steps (the fourth step verifies that σ3 is a solution): j 0 1 2 3 4
(σj δ1 ) pure pure pure pure pure
(σj δ2 ) pure Fi Fi Fi Fi
(σj δ3 ) pure Fr (maxeff Fi Fr ) (maxeff Fi Fr ) (maxeff Fi Fr )
(σj δ4 ) pure Fw (maxeff Fi Fw ) Fmax Fmax
(σj δ5 ) pure pure (maxeff Fr Fw ) Fmax Fmax
The definition of the solveFCS function in Figure 16.4 formalizes this strategy for finding the least solution to an effect-constraint set. Let λδ FCS . effect-expression
stand for a partial function that denotes the value of effect-expression when δ ∈ dom(FCS ) and is otherwise undefined. The least solution of an effect-constraint set FCS is the least fixed point of a series of solutions starting with the bottom solution ⊥DescSubst = λδ FCS . pure. An approximate solution σ is transformed to a better solution σ by mapping each effect variable δ ∈ dom(FCS ) to (σ (maxeff δ F1 . . . Fn )), where {F1 , . . . , Fn } is the set of all effects Fi appearing in constraints of the form (>= δ Fi ) in FCS . Since (σ δ) = (σ (maxeff δ F1 . . . Fn )) = (maxeff (σ δ) (σ F1 ) . . . (σ Fn ))
clearly (σ δ) e (σ δ) for each δ mentioned in FCS , so the transformation is monotonic. By the argument given above, each chain of solutions is finite, so monotonicity is sufficient to guarantee the existence of a least solution for fixDescSubst . Simple Type/Effect Reconstruction Rules The Algorithm Z type/effect reconstruction rules for most expressions are presented in Figure 16.6. The rules for literals, errors, and nongeneric variable references are similar to the FLARE type reconstruction rules, but additionally specify a pure effect and an empty effect-constraint set. The [→-introZ ] rule has already been discussed above. The [ifZ ], [→-elimZ ], [primZ ] rules are similar to
964
Chapter 16
Effects Describe Program Behavior
Function Signature for Type/Effect Reconstruction of Expressions Z : Exp → TypeEnvironment → (Type × TypeConstraintSet × Effect × EffectConstraintSet) Type/Effect Reconstruction Rules for Expressions Z[[#u]] TE = unit, {}TCS , pure, [ ]FC [boolR ], [intR ], and [symbR ] are similar Z[[(error Y )]] TE = δ, {}TCS , pure, [ ]FC where δ is fresh. Z[[I ]] TE = T , {}TCS , pure, [ ]FC where TE (I ) = T Z[[I ]] TE = unit, failTCS , pure, [ ]FC where I ∈ dom(TE )
Z[[(if E1 E2
[unitZ ] [errorZ ] [varZ ] [var-failZ ]
∀3i=1 . (Z[[Ei ]] TE = Ti , TCS
i , Fi , FCS i ) . . 3 [ifZ ] E3 )]] TE = T2 , i=1 TCS i {T1 = bool, T2 = T3 }TCS , 3 3 (maxeff F i=1 ), @i=1 FCS i n
Z[[Ebody ]] TE [Ii : δi ]i=1 = Tbody , TCS body , Fbody , FCS body n Z[[(abs (I i=1 ) Ebody )]] TE = (-> (δ ni=1 ) δlat Tbody ), TCS body , pure, (>= δlat Fbody ) . FCS body where δ ni=1 and δlatent are fresh ∀ni=0 . (Z[[Ei ]] TE = Ti , TCS i , Fi , FCS i ) Z[[(E0 E ni=1 )]] TE . n n ) δlat δres )}TCS , = δres , ( i=0 TCS i ) {T0 = (-> (T i=1 n n (maxeff F i=0 δlat ), @i=0 FCS i where δlat and δres are fresh Z[[Oop ]] TE prim = Top , TCS 0 , pure, [ ]FC ∀ni=1 . (Z[[Ei ]] TE = Ti , TCS i , Fi , FCS i ) Z[[(prim Oop E ni=1 )]] TE . n n ) δlat δres )}TCS , = δres , ( i=0 TCS i ) {Top = (-> (T i=1 n n (maxeff F i=1 δlat ), @i=1 FCS i where δlat and δres are fresh
[→-introZ ]
[→-elimZ ]
[primZ ]
Figure 16.6 The FLARE/E type/effect reconstruction algorithm for simple expressions expressed via deduction rules. For let polymorphism see Figure 16.8.
their FLARE type reconstruction counterparts except that they (1) combine the effects of all subexpressions (and the latent effect of the applied procedure in the case of [→-elimZ ] and [primZ ]) and (2) they combine the effect-constraint sets of all subexpressions.
16.2.3 Reconstructing Types and Effects: Algorithm Z
965
Z[[b]] TE = bool, {}TCS , pure, [ ]FC [varZ ] .. . Z[[Eread ]] TE = Tread , TCS read , (read δrreg ), FCS read Z[[(abs () Eread )]] TE = (-> () δreff Tread ), TCS read , pure, (>= δreff (read δrreg )) . FCS read [→-introZ ] .. . Z[[Ewrite ]] TE = Twrite , TCS write , (write δwreg ), FCS write Z[[(abs () Ewrite )]] TE = (-> () δweff Twrite ), TCS write , pure, (>= δweff (write δwreg )) . FCS write [→-introZ ] Z[[(if b (abs () Eread ) (abs () Ewrite ))]] TE = (-> () δread Tread ), TCS read TCS write . . {bool = bool, (-> () δreff Tread ) = (-> () δweff Twrite )}TCS , pure, ((>= δreff (read δrreg )) . FCS read ) @ ((>= δweff (write δwreg )) . FCS write ) [ifZ ] Figure 16.7 Type/effect reconstruction corresponding to the type/effect derivation of Eifproc on page 955.
As an example, Figure 16.7 shows the fragment of the type/effect derivation for Eifproc from page 955 expressed in the reconstruction system. Distinct effect variables δreff and δweff are introduced as the latent effects for (abs () Eread ) and (abs () Ewrite ), respectively, but these are forced to be the same by the type constraint . (-> () δreff Tread ) = (-> () δweff Twrite )
We assume that the unification algorithm used by the type-constraint set solver solveTCS is extended to unify the latent effects of two procedure types and the regions of two cellof types. The extension is straightforward, because both of these are guaranteed to be description variables: all procedure types generated by the [→-introZ ] rule have latent effects that are description variables and all regions are description variables. Modifying the algorithm to unify arbitrary effects would be significantly more complicated, because the algorithm would need to generate a set of effect constraints in addition to a solution substitution (see [JG91] for details).
966
Chapter 16
Effects Describe Program Behavior
Algebraic Type Schemas for Let Polymorphism A key difference between Algorithm Z and Algorithm R is how let polymorphism is handled. Recall that Algorithm R uses type schemas of the form (generic (τ ni=1 ) T ) to permit a type identifier to be instantiated to different types in different contexts. For example, the identity function (abs (x) x) can be used on any type of input. When it is let-bound to an identifier, it has the type schema (generic (?t) (-> (?t) ?t)). The job of a type schema is to describe all of the possible types of an identifier by determining type variables that can be generalized. In the implicit type/effect system of FLARE/E, type schemas were elaborated with effect and region variables. Reconstructing effects and regions requires us to extend type schemas further to carry along a set of constraints on the effects and regions they describe. In Algorithm Z, generic type schemas (Figure 16.8) are modified to have the form (generic (δ ∗ ) T (FCS )), where FCS contains effect constraints that may involve the effect and region variables in δ ∗ . We call a type schema that includes an effect-constraint set an algebraic type schema [JG91]. The fact that effect-constraint sets appear within algebraic type schemas that have an s-expression representation is the reason that we have chosen to represent effect constraints using the s-expression notation (>= δ F ) and to represent effect-constraint sets as sequences of such constraints. As a simple example, consider the algebraic type schema that the primitive type environment TE prim assigns to the cell assignment operation (:=): (generic (?t ?e ?r) (-> ((cellof ?t ?r)) ?e unit) {type} ((>= ?e (write ?r))) {effect constraints}
This type schema has three parts: (1) the description variables (?t ?e ?r) describe the type, effect, and region variables that can be generalized in the type schema; (2) the procedure type (-> ((cellof ?t ?r)) ?e unit) describes the cell assignment operation and notes that its application has effect ?e; and (3) the effect-constraint set ((>= ?e (write ?r))) describes the constraints on the effect variable ?e. In this case, the assignment operation can have any effect as long as it is larger than (write ?r), where ?r specifies the region in which the cell is allocated. A swap procedure that swaps the contents of two cells would have the following algebraic type schema: TS swap = (generic (?t ?e ?r1 ?r2) (-> ((cellof ?t ?r1) (cellof ?t ?r2)) ?e unit) ((>= ?e (maxeff (read ?r1) (write ?r1) (read ?r2) (write ?r2)))))
16.2.3 Reconstructing Types and Effects: Algorithm Z
967
Domains ATS ∈ AlgebraicTypeSchema ::= T | (generic (δ ∗ ) T (FCS )) TE ∈ TypeEnvironment = Ident AlgebraicTypeSchema Type Functions zgen : Type → TypeEnvironment → TypeConstraintSet → EffectConstraintSet AlgebraicTypeSchema (zgen ⎧ T TE TCS FCS ) ⎪(generic (δ1 . . . δn ) (σ T ) ((σ FCS ))), ⎪ ⎪ ⎪ ⎪ if solveTCS TCS = (TypeSubst UnifySoln σ) ⎨ = and {δ1 , . . . , δn } = (FrDescIds ty [[(σ T )]] ∪ FrDescIds FCS [[(σ FCS )]]) ⎪ ⎪ ⎪ − (FrDescIds tyenv (σ TE )) ⎪ ⎪ ⎩ undefined, otherwise zgenPureSP : Type → TypeEnvironment → TypeConstraintSet → EffectConstraintSet → Exp AlgebraicTypeSchema (zgenPure SP Tdefn TE TCS FCS E ) (zgen Tdefn TE TCS FCS ) if pure E (defined in Figure 13.25 on page 816) = otherwise Tdefn Type/Effect Reconstruction Rules Z[[I ]] TE = ([δi /δi ]ni=1 )Tbody , {}TCS , pure, ([δi /δi ]ni=1 )FCS body where TE (I ) = (generic (δ ni=1 ) Tbody (FCS body )) δ ni=1 are fresh ∀ni=1 . (Z[[Ei ]] TE = Ti , TCS i , Fi , FCS i ) n Z[[E0 ]] TE [Ii : (zgenPureSP Ti TE TCS i FCS i Ei )]i=1 = T0 , TCS 0 , F0 , FCS 0 Z[[(let ((Ii Ei )ni=1 ) E0 )]] TE = T0 , TCS 0 TCS defns , (maxeffF ni=0 ), @ni=0 FCS i n where TCS defns = i=1 TCS i
n ∀ni=1 . Z[[Ei ]] TE [Ij : δj ]j=1 = Ti , TCS i , Fi , FCS i n Z[[E0 ]] TE [Ii : (zgenPureSP Ti TE TCS defns FCS defns Ei )]i=1 = T0 , TCS 0 , F0 , FCS 0 Z[[(letrec ((Ii Ei )ni=1 ) E0 )]] TE = T0 , TCS 0 TCS defns , (maxeff F ni=0 ), @ni=0 FCS i where δ ni=1 are fresh n . n TCS defns = ( i=1 TCS i ) ( i=1 {δi = Ti }TCS ) n FCS defns = @i=1 FCS i Figure 16.8
[genvarZ ]
[letSPZ ]
[letrecSPZ ]
FLARE/E type/effect reconstruction for let polymorphism.
968
Chapter 16
Effects Describe Program Behavior
The effect-constraint set in this algebraic type schema constrains the latent effect of the swap procedure type to include read and write effects for the regions of both cells. As in FLARE type reconstruction, the type/effect reconstruction system introduces type schemas in the rules for let and letrec expressions. Algebraic type schemas are created by the zgenPureSP and zgen functions defined in Figure 16.8. These are similar to the rgenPure and rgen functions used in FLARE type reconstruction (as defined in Figure 13.26 on page 818), except that: • zgenPureSP takes an additional argument, an effect-constraint set, and returns an algebraic type schema rather than a regular type schema. The purity of the expression argument determines whether generalization takes place, and the effect-constraint set is passed along to zgen. zgenPureSP employs the same syntactic purity test used in Algorithm R and the FLARE/E type/effect system; see Section 16.2.5 for a discussion of an alternative purity test. • zgen takes one additional argument, an effect-constraint set, which it incorporates into the algebraic type schema. Note that the schema generalizes over free description variables in the effect constraints (as well as in the type) that are not mentioned in the type environment. As in rgen, the type constraints in zgen must be solved before generalization can take place. Why not solve the effect constraints as well? Because effect constraints involve inequalities rather than equalities, they can only be solved globally, not locally. E.g., knowing that ?e is larger than (read r1) and (write r2) does not allow us to conclude that ?e = (maxeff (read r1) (write r2)), since there may be other constraints on ?e elsewhere in the program that force it to encompass more effects. So the solution of effect inequality constraints must be delayed until all effect constraints in the whole program have been collected (see the [progZ ] rule in Figure 16.9, which is discussed later). In contrast, type equality constraints can be solved eagerly as well as lazily. This is why algebraic type schemas must carry effect constraints but not type constraints. When the type environment assigns an algebraic type schema to a variable, the [genvarZ ] rule instantiates the parameters of the type schema with fresh description variables. Because different description variables are chosen for different occurences of the variable, the definitions associated with the variables may be used polymorphically. Although the variable reference itself is pure, its effectconstraint set includes instantiated versions of the algebraic type schema’s effect constraints. For example, suppose that cell x is an integer cell allocated in region
16.2.3 Reconstructing Types and Effects: Algorithm Z
969
Domains for Type/Effect Reconstruction of Programs RA ∈ ReconAns = Type + Failure Failure = {fail} Function Signature for Type/Effect Reconstruction of Programs Z pgm : Prog → ReconAns Type/Effect Reconstruction Rules for Programs n Z[[E ]] {Ii : δi }i=1 = T , TCS , F , FCS [progZ ] Z pgm [[(flarek (I1 . . . In ) E )]] = RApgm where δ ni=1 are fresh ⎧ (ProgType ReconAns (σFCS (σTCS (=> (δ ni=1 ) F T )))), ⎪ ⎪ ⎪ ⎨ if solve TCS TCS = (TypeSubst UnifySoln σTCS ) RApgm = ⎪ and solve FCS (σTCS FCS ) = σFCS ⎪ ⎪ ⎩ (Failure ReconAns fail), otherwise Figure 16.9 The FLARE/E type/effect reconstruction algorithm for programs expressed via a deduction rule.
rx and and cell y is an integer cell allocated in region ry. Then the reconstruction of (swap x y) would yield an effect-constraint set equivalent to ((>= e1 (maxeff (read rx) (write rx) (read ry) (write ry))))
(where e1 is the fresh description variable substituted for ?e). The reconstruction of (swap x x) would yield an effect-constraint set equivalent to ((>= e2 (maxeff (read rx) (write rx))))
Instantiating the effect-constraint set of the algebraic type schema in the [genvar] rule is consistent with the view that referencing an expression variable bound by a let or letrec to a pure expression E is equivalent to replacing the variable reference by E . Reconstructing Programs At the level of a whole program, the [progZ ] rule in Figure 16.9 models the result of successful type/effect reconstruction as a program type whose • parameter types are the types of the program parameters; • result type is the type of the program body; and • latent effect is the effect of the program body.
970
Chapter 16
Effects Describe Program Behavior
This program type is constructed via (σFCS (σTCS (=> (δ ni=1 ) Fbody Tbody )))
where • δ ni=1 are the description variables generated for the types of the program parameters. • Fbody is the effect reconstructed for the program body, Ebody . • Tbody is the type reconstructed for Ebody . • σTCS is the description substitution that is the solution of the type constraints TCS body collected in the reconstruction of Ebody . • σFCS is the description substitution solveFCS (σTCS FCS body ) that is the global solution of all the effect constraints FCS body collected in the reconstruction of Ebody . Before solveFCS is called, the substitution σTCS must be applied to each constraint in FCS body to incorporate information gleaned from unifying latent effect variables in procedure types and region variables in cell types. For a similar reason, σFCS is applied to the program type after σTCS has been applied. Note that the application of σFCS resolves effect variables not only in Fbody but also in the latent effects of any procedure types that occur in Tbody . Algorithm Z has the Power of FLARE/E Algorithm Z succeeds if the type constraints and effect constraints collected for the program body are solvable. We have seen from the discussion of solveFCS on page 963 that the effect constraints are always solvable, so reconstruction succeeds if and only if the type constraints are solvable. The following theorems say that Algorithm Z is sound and complete for the FLARE/E implicit type/effect system in Figure 16.3 on page 953: Theorem 16.2 (Soundness of Algorithm Z) Suppose Z[[E ]] TE = T , TCS , F , FCS . If σTCS is any solution of TCS , σFCS is any solution of (σTCS FCS ), and σ = σFCS ◦ σTCS , then (σ TE ) E : (σ T ) ! (σ F ). Theorem 16.3 (Completeness of Algorithm Z) If TE E : T ! F then Z[[E ]] TE = T , TCS , F , FCS where there are solutions σTCS of TCS and σFCS of (σTCS FCS ) such that ((σFCS ◦ σTCS ) T ) = T and ((σFCS ◦ σTCS ) F ) = F .
16.2.3 Reconstructing Types and Effects: Algorithm Z
971
In both of these theorems, σTCS may be less general than the most general unifier calculated by (solveTCS TCS ) and σFCS may be greater than the least solution calculated by (solveFCS (σTCS FCS )). In both theorems, it is necessary to apply the composition σFCS ◦ σTCS to both types and effects rather than just applying σTCS to types and σFCS to effects. For types, σFCS may be needed to resolve latent effect variables in procedure types that were determined when solving the effect constraints in FCS . For effects, σTCS may be needed to resolve effect and region variables that were unified as part of solving the type constraints in TCS . Together, the soundness and completeness theorems for Algorithm Z imply a principality result similar to the one shown for Algorithm R: Any type that can be assigned to an expression in FLARE/E is a substitution instance of the type found by Algorithm Z, and any effect that can be assigned to an expression in FLARE/E is a substitution instance of the effect Algorithm Z. Since FLARE expressions are typable in the FLARE/E implicit type/effect system if and only if they are typable in the FLARE implicit type system (by Theorem 16.1 on page 956) and expressions are typable in the FLARE implicit type system if and only if their type can be reconstructed by Algorithm R (by Theorems 13.7 and 13.8 on page 799), a consequence of Theorems 16.2 and 16.3 is that Algorithm Z and Algorithm R succeed on exactly the same set of FLARE expressions and programs. Exercise 16.7 a. Write a FLARE/E abstraction Eswapabs that swaps the contents of its two cell arguments. b. Show the derivation for (Z[[Eswapabs ]] {}), the type/effect reconstruction of Eswapabs in the empty type environment. c. Use zgenPure to create an algebraic type schema for Eswapabs , supplying the type and effect information from part b as arguments. d. Is your algebraic type schema from part c equivalent to TS swap defined on page 966? Explain any discrepancies. Exercise 16.8 Construct an Algorithm Z type/effect derivation for the following program: (flarek (a b) (let ((mapcell (abs (c f) (let ((v (prim ^ c))) (let ((_ (prim := c (f v)))) v))))) (mapcell a (abs (x) (mapcell b (abs (y) x))))))
972
Chapter 16
Effects Describe Program Behavior
Exercise 16.9 Give type/effect reconstruction derivations for the programs in part a and part b of Exercise 16.1 on page 956. Exercise 16.10 Write a FLARE/E program whose type/effect reconstruction uses an algebraic type schema whose effect-constraint set has more than one constraint. Show this by giving the type/effect reconstruction derivation for your program. Exercise 16.11 Modify the FLARE/E implicit type/effect system and Algorithm Z to handle mutable variables (via the set! construct). Begin by studying Exercise 13.11 on page 820 to get a sense for the issues involved in this extension. In particular, references to variables modified by set! are no longer pure — they have a read effect! Your system should distinguish variables modified by set! from those that are not; references to the latter can still be considered pure.
16.2.4
Effect Masking Hides Unobservable Effects
We now explore some variations on the FLARE/E type/effect system. The first involves effect masking, which allows effects to be deleted from an expression when they cannot be observed from outside of the expression. For example, consider the following procedure, which sums the elements of a list of integers: Esumabs = (abs (ints) (let ((sum (cell 0))) (letrec ((loop (abs (ns) (if (null? ns) (^ sum) (begin (:= sum (+ (^ sum) (car ns))) (loop (cdr ns))))))) (loop ints))))
Suppose that the cell named sum is in region rs. According to the effect rules we have studied thus far, the latent effect of the type for this procedure is (maxeff (init rs) (read rs) (write rs))
Intuitively, however, the sum cell is completely internal to the summation procedure and cannot be observed outside the procedure. There is no experiment that a client can perform to determine whether or not the summation procedure uses cells in its implementation. We can use the type/effect system to prove that the effects within the summation procedure are unobservable outside the procedure. We do this by showing that no cell in region rs can be referenced outside the let expression that is the body of the procedure. Region rs does not appear in the type (int) of the
16.2.4 Effect Masking Hides Unobservable Effects TE E : T ! F TE E : T ! F
where F e F ∀BF ∈ (F[[F ]] − F[[F ]]) . (∀δ ∈ FrDescIds eff [[BF ]] . ((δ ∈ FrDescIds ty [[T ]]) ∧ (∀I ∈ FrIds[[E ]] . (δ ∈ FrDescIds ty [[TE (I )]]))))) Figure 16.10
973
[effect-masking]
[export restriction] [import restriction]
An effect-masking rule for FLARE/E.
procedure body, nor does it appear in the type environment in the types of the free variables used in the procedure body (ints, cell, ^, :=, +, null?, car, cdr). This shows that region rs is inaccessible outside the procedure body, and so cannot be observed by any client of the procedure. We can add effect masking to FLARE/E by extending the type/effect rules with the [effect-masking] rule in Figure 16.10. This rule says that any base effect BF can be deleted from the effect of an expression E as long as it is purely local to E — i.e., it cannot be observed elsewhere in the program. BF is local to E if no effect and region variable δ appearing in it is mentioned in the type of any free variable used by E (the import restriction) or can escape to the rest of the program in the type of E (the export restriction). In some sense, the [effectmasking] rule is the opposite of the [does] rule, since it allows deflating the effect of an expression as opposed to inflating it. In the case of the list summation procedure, the [effect-masking] rule formalizes our above reasoning about effect observability. It allows the let expression to be assigned the pure effect, making the latent effect of the procedure type for Esumabs pure as well. Note that the [effect-masking] rule does not allow any effects to be deleted from the letrec expression in the list summation procedure. Although rs does not appear in the type (int) of this expression, it does appear in the type (cellof int rs) of the free variable sum used in the expression. Effect masking is an important tool for encapsulation. The [effect-masking] rule can detect that certain expressions, while internally impure, are in fact externally pure. It thus permits impure expressions to be included in otherwise stateless functional programs; expressions can take advantage of local side effects for efficiency without losing their referential transparency. As we will see in Section 16.3.1, it also allows effects that denote control transfers to be masked, indicating that an expression may perform internal control transfers that are not observable outside of the expression.
974
Chapter 16
Effects Describe Program Behavior
If the [effect-masking] rule is so important, why didn’t we include it as a rule in the FLARE/E type/effect system presented in Figure 16.3 on page 953? The reason is that it complicates the story of type reconstruction. The effects computed by the solveFCS function in Algorithm Z are the least effects for the type/effect system presented in Figure 16.3. But they are no longer the least effects when the [effect-masking] rule is added, since this rule allows even smaller effects. For example, Algorithm Z would determine that the effect of the let expression that is the body of Esumabs includes init, read, and write effects for the region rs in which the sum cell is allocated, but we have seen that these can be eliminated by the [effect-masking] rule. Exercise 16.12 Consider the following FLARE expression: (abs (a) (let ((b (cell 1))) (snd (let ((_ (:= a (^ b))) (c (cell 2)) (d (cell 3))) (let ((_ (:= c (^ e)))) {e is a free variable} (pair c d))))))
a. Construct a FLARE/E type/effect derivation for this expression that does not use the [effect-masking] rule. Assume that each cell is allocated in a separate region. b. Construct a FLARE/E type/effect derivation for this expression that uses the [effectmasking] rule to find the smallest allowable effect for each subexpression.
16.2.5
Effect-based Purity for Generalization
It may be surprising that the egenPureSP function for type generalization in the FLARE/E type rules (Figure 16.2 on page 952) determines expression purity using a syntactic test rather than using the effect system itself. Here we explore an alternative type/effect system FLARE/EEP that determines purity via an effect-based test rather than a syntactic test. (The EP subscript stands for “effect purity.”) The key difference between FLARE/EEP and FLARE/E is the new function egenPureEP in Figure 16.11. Like egenPureSP , egenPureEP uses its third argument to determine the purity (and thus the generalizability) of an expression. However, egenPureEP ’s third argument is an effect determined from the effect system, whereas egenPureSP ’s is an expression whose effect is determined by a separate syntactic deduction system. The [letEP ] and [letrecEP ] rules employ effect-based purity by passing appropriate effects to egenPureEP .
16.2.5 Effect-based Purity for Generalization
975
New Type Function egenPureEP : Type → TypeEnvironment → Effect → TypeSchema (egen Tdefn TE ) if F ≈e pure (egenPureEP Tdefn TE F ) = otherwise Tdefn Modified Type/Effect Rules ∀ni=1 . (TE Ei : Ti ! Fi ) n TE [Ii : (egenPureEP Ti TE Fi )]i=1 E0 : T0 ! F0 TE (let ((Ii Ei )ni=1 ) E0 ) : T0 ! (maxeff F ni=0 )
n ∀ni=1 . TE [Ij : Tj ]j=1 Ei : Ti ! Fi n TE [Ii : (egenPureEP Ti TE Fi )]i=1 E0 : T0 ! F0 TE (letrec ((Ii Ei )ni=1 ) E0 ) : T0 ! (maxeff F ni=0 )
[letEP ]
[letrecEP ]
Figure 16.11 Modified type/effect rules for FLARE/EEP , a system that uses the effect system itself rather than syntactic tests to determine purity.
The FLARE/EEP type system is more powerful than the FLARE and FLARE/E systems: every expression typable in FLARE and FLARE/E is typable in FLARE/EEP , but there are expressions typable in FLARE/EEP that are not typable in FLARE or FLARE/E. Consider the expression: EcurriedPair = (let ((cp (abs (x) (abs (y) (prim pair x y))))) (let ((cp1 (cp 1))) (prim pair (cp1 #u) (cp1 #t))))
This expression is not well typed in FLARE. According to the syntactic definition of purity in Figure 13.25 on page 816, the application (cp 1) is considered impure, so the type of cp1 cannot be generalized to a polymorphic type and must be a monomorphic type of the form (-> (Ty ) (pairof int Ty )). Since (cp1 #u) requires Ty to be unit and (cp1 #t) requires Ty to be bool, no FLARE typing is possible. Similar reasoning shows that EcurriedPair is not well typed in FLARE/E. In contrast, EcurriedPair is well typed in FLARE/EEP , as shown by the type/effect derivation in Figure 16.12. The key difference is that the effect system can deduce that the application (cp 1) is pure, and this allows the type of cp1 to be generalized in FLARE/EEP . The extra typing power of FLARE/EEP derives from using the more precise purity test of the effect system itself in place of the crude syntactic purity test used in FLARE and FLARE/E.
976
Chapter 16
Effects Describe Program Behavior
Abbreviations EcurriedPair = (let ((cp Eabs1 )) (let ((cp1 (cp 1))) (prim pair (cp1 #u) (cp1 #t)))) Eabs1 = (abs (x) Eabs2 ) Eabs2 = (abs (y) (prim pair x y)) TE 1 = {cp : (generic (?x ?y) (-> (?x) pure (-> (?y) pure (pairof ?x ?y))))} TE 2 = TE 1 [cp1 : (generic (?y) (-> (?y) pure (pairof int ?y)))] Tiu = (pairof int unit) Tib = (pairof int bool) Type/Effect Derivation TE prim pair : (-> (?x ?y) pure (pairof ?x ?y)) ! pure [genvar] {x : ?x, y : ?y} x : ?x ! pure [var] {x : ?x, y : ?y} y : ?y ! pure [var] {x : ?x, y : ?y} (prim pair x y) : (pairof ?x ?y) ! pure [prim] {x : ?x} Eabs2 : (-> (?y) pure (pairof ?x ?y)) ! pure [→-intro] {} Eabs1 : (-> (?x) pure (-> (?y) pure (pairof ?x ?y))) ! pure [→-intro] TE 1 cp : (-> (int) pure [genvar] (-> (?y) pure (pairof int ?y))) ! pure TE 1 1 : int ! pure [int] TE 1 (cp 1) : (-> (?y) pure (pairof int ?y)) ! pure [→-elim] TE prim pair : (-> (Tiu Tib ) pure (pairof Tiu Tib )) ! pure [genvar] TE 2 cp1 : (-> (unit) pure (pairof int unit)) ! pure [genvar] TE 2 #u : unit ! pure [unit] TE 2 (cp1 #u) : (pairof int unit) ! pure [→-elim] TE 2 cp1 : (-> (bool) pure (pairof int bool)) ! pure [genvar] TE 2 #t : bool ! pure [bool] TE 2 (cp1 #t) : (pairof int bool) ! pure [→-elim] TE 2 (prim pair (cp1 #u) (cp1 #t)) : (pairof Tiu Tib ) ! pure [prim] [let] TE 1 (let ((cp1 (cp 1))) (prim pair (cp1 #u) (cp1 #t))) : (pairof Tiu Tib ) ! pure {} EcurriedPair : (pairof Tiu Tib ) ! pure [let] Figure 16.12
Type/effect derivation for EcurriedPair in FLARE/EEP .
16.2.5 Effect-based Purity for Generalization
977
Given the apparent advantages of effect-based purity over syntactic purity, why did we adopt syntactic purity as the default in the FLARE/E type/effect system? The reason is that effect-based purity greatly complicates type reconstruction. With syntactic purity, the decision to generalize types in the [letSPZ ] and [letrecSPZ ] type reconstruction rules is independent of solving the effect constraints collected during reconstruction. With effect-based purity, type generalization may depend on the result of solving effect constraints. This introduces a fundamental dependency problem: the decision to generalize must be made when processing let and letrec expressions, but the effect constraints cannot be solved until the whole program body has been processed. One way to address this dependency problem is via backtracking (see Exercise 16.16). Exercise 16.13 Show that the FLARE/EEP type/effect system can be made even more powerful by extending it with the [effect-masking] rule in Figure 16.10 on page 973. That is, give an expression that is typable in FLARE/EEP + [effect-masking] that is not typable in FLARE/EEP . Exercise 16.14 Thai Ping suggests the following subtyping rule for FLARE/E procedure types: ∀n Tbody Tbody F e F i=1 . (Ti Ti ) n n (-> (T 1=1 ) F Tbody ) (-> (T i=1 ) F Tbody )
[→-]
a. Suppose that the FLARE/EEP type system were extended with Thai’s rule as well as with a version of the [inclusion] type rule in Figure 12.1 on page 703. Give an example of an expression that is well typed in the extended system that is not well typed in the original one. b. Suppose that the FLARE/E type system were extended with Thai’s rule as well as the [inclusion] type rule. Are there any expressions that are well typed in the extended system but not well typed in the original one? Either give such an expression or show that the two systems are equivalent in terms of typing power. Exercise 16.15 Bud Lojack thinks that a small modification to Algorithm Z can make it sound and complete for FLARE/EEP , the version of the FLARE/E type system using effect-based purity. He modifies the reconstruction rules for let and letrec to use a new zgenPureEP function that performs an effect-based purity test (Figure 16.13). Excitedly, Bud shows his modifications to Thai Ping. But Thai bursts Bud’s bubble when he observes, “Your modified rules are just another way of reconstructing types and effects for FLARE/E, not for FLARE/EEP . The problem is that the purity test in zgenPureEP involves effect expressions containing effect variables that may eventually be shown to be pure but are conservatively assumed to be impure when the purity test is performed.”
978
Chapter 16
Effects Describe Program Behavior
Show that Thai is right by fleshing out the following two steps, which show that replacing the [letSPZ ]/[letrecSPZ ] rules by the [letEPZ ]/[letrecEPZ ] rules does not change which expressions can be reconstructed by Algorithm Z. a. Prove the following lemma: Lemma 16.4 In Bud’s modified Algorithm Z, suppose that Z[[E ]] TE = T , TCS , F , FCS and (solveTCS TCS ) = (TypeSubst UnifySoln σTCS ). Then (σTCS F ) ≈e pure if and only if (pure E ) according to the deduction system for pure defined in Figure 13.25 on page 816. Hint: What is the form of every latent effect in a procedure type generated by Algorithm Z? What does this imply about the purity of procedure applications? b. Using Lemma 16.4, show that in any type/effect derivation from Bud’s modified Algorithm Z, any instances of the [letEPZ ] and [letrecEPZ ] rules can be replaced by the [letSPZ ] and [letrecSPZ ] rules without changing the validity of the derivation. Exercise 16.16 Bud Lojack’s version of Algorithm Z (see Exercise 16.15) fails to reconstruct the types and effects of some expressions that are well typed in the FLARE/EEP type/effect system because it doesn’t “know” the purity of certain effect variables that eventually turn out to be pure. This drawback can be addressed by aggressively assuming that all effect variables are pure unless there is evidence otherwise, and backtracking in any case where the assumption is later proven to be false. a. Design and implement a backtracking version of Bud’s modified Algorithm Z based on this idea. b. Show that your modified version of Algorithm Z can successfully reconstruct the type and effect of the expression EcurriedPair defined on page 975.
16.3
Using Effects to Analyze Program Behavior
Thus far we have considered a system for calculating only store effects. Store effects are especially useful for guiding compiler optimizations like parallelization, common subexpression elimination, dead code elimination, and code hoisting (see Section 17.6). We now explore other kinds of effects and show how effect information can be used to reason about program behavior and guide the implementation of programs.
16.3.1
Control Transfers
Effects can be used to analyze control transfers, such as those expressed via the label and jump constructs studied in Section 9.4. Recall that (label Icp Ebody ) evaluates Ebody in an environment where Icp names the control point corresponding to the continuation of the label expression, and (jump Ecp Eval ) jumps to
16.3.1 Control Transfers
979
New Type Function zgenPureEP : Type → TypeEnvironment → TypeConstraintSet → EffectConstraintSet → Effect AlgebraicTypeSchema (zgenPure ⎧ EP Tdefn TE TCS FCS F ) ⎪ ⎨(zgen Tdefn TE TCS FCS ) , = if solveTCS TCS = (TypeSubst UnifySoln σ) and (σ F ) ≈e pure ⎪ ⎩ Tdefn , otherwise Modified Type/Effect Reconstruction Rules ∀ni=1 . (Z[[Ei ]] TE = Ti , TCS i , Fi , FCS i ) n Z[[E0 ]] TE [Ii : (zgenPureEP Ti TE TCS i FCS i Fi )]i=i = T0 , TCS 0 , F0 , FCS 0 Z[[(let ((Ii Ei )ni=1 ) E0 )]] TE = T0 , TCS 0 TCS defns , (maxeffF ni=0 ), @ni=0 FCS i n where TCS defns = i=1 TCS i
n ∀ni=1 . Z[[Ei ]] TE [Ij : δj ]j=1 = Ti , TCS i , Fi , FCS i n Z[[E0 ]] TE [Ii : (zgenPureEP Ti TE TCS defns FCS defns Fi )]i=i = T0 , TCS 0 , F0 , FCS 0 Z[[(letrec ((Ii Ei )ni=1 ) E0 )]] TE = T0 , TCS 0 TCS defns , (maxeff F ni=0 ), @ni=0 FCS i where δ ni=1 are fresh n . n TCS defns = ( i=1 TCS i ) ( i=1 {δi = Ti }TCS ) FCS defns = @ni=1 FCS i
[letEPZ ]
[letrecEPZ ]
Figure 16.13 Bud Lojack’s modified type/effect reconstruction rules for let and letrec in Algorithm Z (Exercise 16.15).
the control point denoted by Ecp with the value of Eval . Here is a simple example in a version of FLARE/E extended with these two constructs: Eproc1 = (abs (x y) (+ 1 (label exit (* 2 (if (< y 0) (jump exit y) x)))))
In Eproc1 , label gives the name exit to the control point that returns the value of the label expression. If y is negative, the jump to exit returns y as the value of the label expression, and Eproc1 returns one more than the value of y. Otherwise, no jump is performed, the value of the label expression is double the value of x, and Eproc1 returns one more than double the value of x. (See Section 9.4 for more examples of nonlocal exits.) The control behavior of label and jump can be modeled by introducing a new type and two new effect constructors:
980
Chapter 16
Effects Describe Program Behavior
TE [Icp : (controlpointof Tbody R)] Ebody : Tbody ! Fbody TE (label Icp Ebody ) : Tbody ! (maxeff (comefrom R) Fbody )
[cp-intro]
TE Ecp : (controlpointof Tval R) ! Fcp TE Eval : Tval ! Fval TE (jump Ecp Eval ) : Tany ! (maxeff (goto R) Fcp Fval )
[cp-elim]
Figure 16.14
Type/effect rules for label and jump.
T ∈ Type ::= . . . | (controlpointof T R) FCR ∈ EffectConstructor = . . . ∪ {goto, comefrom}
The type (controlpointof T R) describes a control point in region R that expects to receive a value of type T . An expression has effect (goto R) if it might jump to a control point in R, and it has effect (comefrom R)6 if it creates a control point in R that could be the target of a jump. Although regions represent areas of memory in store effects, they represent sets of control points in control effects, and can have other meanings for other kinds of effects. The FLARE/E type/effect system can be extended to handle control effects with the two rules in Figure 16.14. In the [cp-intro] rule, (label Icp Ebody ) introduces a control point with type (controlpointof Tbody R) into the type environment in which Ebody is type-checked. The type of the label expression must be the same whether Ebody returns normally (without encountering a jump) or a jump is performed to the named control point. This constrains the received value type in the controlpointof type to be the same as the type Tbody of Ebody . The effect of the label expression includes (comefrom R) to indicate that it introduces a control point in region R. The [cp-elim] rule requires that in (jump Ecp Eval ) the type of Ecp must be (controlpointof Tval R), where the received value type Tval must match the type of the supplied value Eval . The effect of a jump expression includes (goto R) to model its control-point-jumping behavior. The jump expression has an unconstrained type, Tany , that is determined by the context in which it is used. For example, in (* 2 (if (< y 0) (jump exit y) x))
the jump expression has type int to match the type of x. But in (* 2 (if (scor (< y 0) (jump exit x)) y x)) 6
The effect name comefrom is a play on the name goto, and was inspired by a spoof article [Cla73] on a COME FROM statement dual to a GOTO statement.
16.3.1 Control Transfers
981
the jump expression must have type bool because it appears in a context that requires a boolean value. Returning to our example, Eproc1 = (abs (x y) (+ 1 (label exit (* 2 (if (< y 0) (jump exit y) x)))))
exit has type (controlpointof int cp1), where cp1 is a control region. The expression (jump exit y) has type int and effect (goto cp1). The label expression has type int and an effect, (maxeff (comefrom cp1) (goto cp1)), describing that it establishes a control point in region cp1 that is the target of jump that may be performed in its body. In this simple example, the control effects (comefrom cp1) and (goto cp1) are completely local to the label expression. So a system that supports effect masking (Section 16.2.4) can delete them from the effect of the label expression and the latent effect of the abstraction, making these effects pure. This highlights that effect masking works for all effects, including control effects and store effects. When a control effect in region R can be masked from expression E , it means that no part of the program outside E will be subject to unexpected control transfers with respect to the continuation associated with R. Effect masking of control effects is powerful because it allows module implementers to use control transfers internally, while allowing clients of the modules to insist that these internal control transfers not alter the clients’ control flow. In a system using explicit types and effects at module boundaries, a client can guarantee this invariant by ensuring that it does not call module procedures with control effects. As an example where control effects cannot be deleted, consider: Eproc2 = (label exit (abs (y) (if (= y 0) (jump exit (abs (z) z)) (+ 1 y))))
In this example, evaluating the label expression returns the procedure created by (abs (y) . . . ) without peforming any jumps. This procedure behaves like an incrementing procedure when called on a nonzero argument. But applying it to 0 has the bizarre effect of returning from the label expression a second time with the identity procedure instead! What are the types in this example? Let’s assume that the control point for exit is in region cp2. Then (abs (y) . . .) must have type Tproc2 = (-> (int) (goto cp2) int)
982
Chapter 16
Effects Describe Program Behavior
because it takes an integer y, returns an integer (+ 1 y), and may jump to exit. The type of exit must be (controlpointof Tproc2 cp2), because the [cp-intro] rule requires the received value type of the control point to be the same as the body type. The type of (abs (z) z) must also be Tproc2 , because the [cp-elim] rule requires the received value type of the control point to match the type of the value supplied to jump. Finally, the label expression has type Tproc2 and effect (comefrom cp2), which does not include a goto effect because no jump can be performed by evaluating the label expression. Because cp2 appears in the type Tproc2 of Eproc2 , the (comefrom cp2) effect cannot be deleted from Eproc2 via effect masking. This effect tracks the fact that the procedure resulting from Eproc2 can jump back into Eproc2 if it is called with the argument 0. Since the impurity of Eproc2 is externally observable (see Exercise 16.17), the control effect cannot be deleted. Exercise 16.17 a. Assuming Eproc2 is the expression studied above, what is the value of the following expression? (let ((g Eproc2 ) (h Eproc2 )) (list (g 1) (h 1) (h 0)))
b. Based on your answer to part a, argue that Eproc2 cannot be a pure expression. Exercise 16.18 Extend the Algorithm Z type/effect reconstruction rules to handle label and jump. Exercise 16.19 Control effects can be used to describe the behavior of the procedure cwcc (see Section 9.4.4). a. Give a type schema for cwcc that is as general as possible. b. Show how your type schema for cwcc can be instantiated in the following FLARE/E expressions: i.
= Eproc1
(abs (x y) (+ 1 (cwcc (abs (exit) (* 2 (if (< y 0) (exit y) x))))))
ii.
Eproc2 =
(cwcc (abs (exit) (abs (y) (if (= y 0) (exit (abs (z) z)) (+ 1 y)))))
16.3.2 Dynamic Variables
983
c. Consider the following FLARE/E abstraction: Eproc3 = (abs (x y) (+ 1 (cwcc (abs (exit) (* 2 (if (scor (< y 0) (exit x)) (exit y) x))))))
i.
Explain why Eproc3 is ill typed in FLARE/E.
ii.
In an explicitly typed dialect of FLARE/E with universal (i.e., forall) types, with appro(1) give a type for cwcc and (2) write a well-typed version of Eproc3 priate explicit type annotations.
iii.
to a well-typed FLARE/E abstraction that uses label and jump Convert Eproc3 instead of cwcc.
iv.
What feature of the label and jump type/effect rules makes the well-typedness of your converted abstraction possible?
v.
Show that your converted abstraction can be given a pure latent effect in a version of FLARE/E with the [effect-masking] rule.
16.3.2
Dynamic Variables
In a dynamically scoped language (see Section 7.2.1), dynamically bound variables (i.e., the free variables of a procedure) take their meaning from where the procedure is called rather than where it is defined. References to dynamically bound variables can be tracked by an effect system in which (1) the effect of an expression is the set of dynamically bound variables it might reference and (2) procedure types are extended to have the form ∗ (-> (Targ ) ((Idyn Tdyn )∗ ) Tresult )
Each binding (Idyn Tdyn ) serves both as a kind of latent effect (each name Idyn is a dynamically bound variable that may be referenced wherever the procedure is called) and as a way to check (using Tdyn ) that the dynamically bound variable is used with the right type at every invocation of the procedure. This sketch for how effects can be used to give types to dynamic variables is fleshed out in Exercise 16.20. Exercise 16.20 Dinah McScoop likes both dynamic scoping and explicit types, so she creates a new language, DIFLEX, that includes both! The syntax of DIFLEX is like FLEX, except for the definition and type of procedures, which have been modified as follows: E ∈ Exp ::= . . . | (abs ((Ifml Tfml )∗ ) ((Idyn Tdyn )∗ ) Ebody ) ∗ T ∈ Type ::= . . . | (-> (Targ ) ((Idyn Tdyn )∗ ) Tresult )
984
Chapter 16
Effects Describe Program Behavior
In abs, the first list of identifiers and types, ((Ifml Tfml )∗ ), specifies the formal parameters of the procedure and their types. The second list, ((Idyn Tdyn )∗ ), specifies the names and types of the dynamically bound identifiers (all non-parameter identifiers) that appear in Ebody . Procedure types include the names and types of dynamically bound identifiers in addition to the usual parameter type list and result type. As usual, in a procedure application, the procedure’s parameter types must match the types of the actual arguments. Because DIFLEX is dynamically scoped, the types of the dynamically bound identifiers in the procedure type must match the types of these identifiers wherever the procedure is called, not where it is defined. For example, the following expression is well typed in Dinah’s language because the dynamically bound variable x is a boolean where procedure p is called (the fact that x is an integer where p is created is irrelevant): (let ((x 1)) (let ((p (abs ((y int)) ((x bool)) (if x y 0)))) (let ((x #t)) (p 1)))) {This expression evaluates to 1}
In contrast, the following expression is ill typed: (let ((x #t)) (let ((p (abs ((y int)) ((x bool)) (if x y 0)))) (let ((x 1)) (p 1)))) {x is not a boolean in this call to p}
Dinah realizes that uses of dynamic variables can be tracked by an effect system. Dinah extends the FLEX typing framework to employ type/use judgments of the form TE E : T & IS
which means “in type environment TE , E has type T and may use identifiers from the set IS .” Assume IS ∈ IdSet = P(Ident). For example, Dinah’s type/use rule for variable references is: TE I : TE (I ) & {I }
[var]
Dinah provides the following examples of type/use judgments for her system: {x : int} (prim + 1 x) : int & {x} {} (let ((x 1)) (prim + 1 x)) : int & {} {x : bool, y : int} (if x y 0) : int & {x, y} {x : int} (abs ((y int)) ((x bool)) (if x y 0)) : (-> (int) ((x bool)) int) & {} {x : bool, p : (-> (int) ((x bool)) int)} (p 1) : int & {p, x}
In the final type judgment, note that the identifier set for (p 1) includes x because the procedure p has a dynamic reference to x. a. Write type/use rules for the following constructs: let, abs, and procedure application.
16.3.3 Exceptions
985
b. Briefly argue that your type/use rules guarantee that in a well-typed program, an identifier can never be unbound or used with an incorrect type. c. Dinah’s friend Thai Ping observes that the following DIFLEX expression is ill typed: (abs ((b bool)) () (let ((f (abs ((x int)) ((c int)) (prim + x c))) (g (abs ((y int)) ((d int)) (prim * y d)))) (let ((c 1) (d 2)) ((if b f g) 3))))
i.
Explain why this expression is ill typed.
ii.
Thai suggests that expressions like this can be made well typed by extending the type/usage rules for DIFLEX with a type-inclusion rule (see Figure 12.1 on page 703). Define an appropriate notion of subtyping for DIFLEX’s procedure types, and show how Thai’s example is well-typed in the presence of type inclusion.
d. Based on ideas from the DIFLEX language and type system, develop an explicitly typed version of the DYNALEX language (Exercise 7.25 on page 349) named DYNAFLEX. Describe the syntax and type system of DYNAFLEX. Hint: Since DYNAFLEX has two namespaces — a static one and a dynamic one — the type system needs two type environments.
16.3.3
Exceptions
Recall from Section 9.6 that exception-handling mechanisms specify how to deal with abnormal conditions in a program. A type/effect system can be used to track the exceptions that might be raised when evaluating an expression. One way to do this in FLARE/E is to use new base effects with the form (raises Itag Tinfo ) to indicate that an expression raises an exception with tag Itag and information of type Tinfo . If an expression E handles an exception with tag Itag , the (raises Itag Tinfo ) effect can be removed from the effect set of E . Exercise 16.21 explores a specialized effect system that tracks only exceptions and not other effects. Java is an example of an explicitly typed language with an effect system for exceptions. It tracks a subset of exceptions known as checked exceptions. If a checked exception is thrown (Java’s terminology for raising an exception) in the body of a method, then it must either be explicitly handled by a try/catch statement or explicitly listed in a throws clause of the method specification. For example, a Java method that displays the first n characters of a text file might have the following specification: public static void readFirst (n:int, filename:string) throws FileNotFoundException, EOFException;
986
Chapter 16
Effects Describe Program Behavior
The throws clause indicates that the readFirst method may not handle the case where there is no file named filename (in which case a FileNotFoundException is thrown) and might attempt to read past the end of a file (in which case an EOFException7 is thrown). The throws clause serves as an explicit latent exception effect for the method. Any method invoking readFirst in its body must either handle the exceptions it throws or explicitly declare them in its own throws clause. Exercise 16.21 Bud Lojack wants to add exceptions with termination semantics to FLARE. He extends the FLARE expression and type syntax as follows: E ∈ Exp ::= . . . | (raise Itag Einfo ) | (handle Itag Ehandler Ebody ) T ∈ Type ::= . . . | (handlerof Tinfo )
The dynamic semantics of the raise and handle constructs is described in Section 9.6. Bud’s new type (handlerof Tinfo ) stands for an exception handler that processes exception information with type Tinfo . In Bud’s new type rules, the handlerof type is used to communicate type information from the point of the raise to the point of the handle: TE Ehandler : (-> (Tinfo ) Tbody ) TE [Itag : (handlerof Tinfo )] Ebody : Tbody TE (handle Itag Ehandler Ebody ) : Tbody TE Itag : (handlerof Tinfo ) TE Einfo : Tinfo TE (raise Itag Einfo ) : Traise
[handle]
[raise]
Note that because raise never returns in termination semantics, the type Traise of a raise expression (like the type of an error expression) can be any type required by the surrounding context. Bud proudly shows his new rules to type guru Thai Ping, who is unimpressed. “Your rules make the type system unsound!” exclaims Thai. “You’ve assumed that exception handlers are statically bound when they’re actually dynamically bound.” a. Explain what Thai means. In particular, provide expressions Eouter and Einner such that the following expression is well typed according to Bud’s rules, but generates a dynamic type error: (handle an-exn Eouter (let ((f (abs () (raise an-exn 17)))) (handle an-exn Einner (f))))
Thai observes that raising an exception Itag with a value of type Tinfo is similar to referencing a dynamic variable named Itag bound to a handler procedure with type (-> (Tinfo ) Tresult ), where Tresult can be different for different handlers associated with 7
EOF stands for “End Of File.”
16.3.3 Exceptions
987
Itag . Since dynamic variables can be typed using an effect system (see Exercise 16.20), Thai aims to develop a similar effect system for typing exceptions. Thai’s system is based on “effects” from the following domain: ES ∈ ExceptionSpec = Ident Type
An exception specification ES is a partial function mapping the name of an exception that can be raised to the type of the information value with which it is raised. For example, representing a partial function as a set of bindings, the exception specification {bounds → int, wrong → bool} indicates that the bounds exception is raised with an integer and the wrong exception is raised with a boolean. Two exception specifications can be combined via ⊕ or #, which require that they agree on names for which they are both defined:
ES 1 ⊕ ES 2
ES 1 ES 2
8 > λI . if I ∈ dom(ES 1 ) then (ES 1 I ) > > > > else if I ∈ dom(ES 2 ) then (ES 2 I ) else undefined end < end, = > > if I ∈ (dom(ES 1 ) ∩ dom(ES 2 )) implies (ES 1 I ) ≈ (ES 2 I ) > > > : undefined, otherwise 8 λI . if (I ∈ dom(ES 1 )) ∧ (I ∈ dom(ES 2 )) then (ES 1 I ) > > < else undefined end, = if I ∈ (dom(ES 1 ) ∩ dom(ES 2 )) implies (ES 1 I ) ≈ (ES 2 I ) > > : undefined, otherwise
In Thai’s type/exception system, judgments have the form TE E : T # ES
which means “in type environment TE , expression E has type T and may raise exceptions as specified by ES .” For example, the judgment TE Etest : bool # {x → int, y → symb}
indicates that if Etest returns normally, its value will be a boolean, but that evaluation of Etest could raise the exception x with an integer value or the exception y with a symbol value. Thai’s system guarantees that no other exceptions can be raised by Etest . Thai’s system also uses exception masking to remove exceptions from judgments when it is clear that they will be handled. E.g., the exception specification of TE (handle x (abs (z) (> z 0)) Etest ) : bool # {y → symb}
does not include x → int from Etest because the exception named x has been handled by the handle expression. Thai eliminates Bud’s handlerof from the FLARE type system and instead changes procedure types to carry a latent exception specification describing exceptions that might be raised when the procedure is applied :
988
Chapter 16
Effects Describe Program Behavior
∗ T ∈ Type ::= . . . FLARE types except for -> . . . | (-> (Targ ) ES lat Tres )
Here are two of the type/exception rules from Thai’s system: TE N : int # {}
[int]
TE E1 : bool # ES 1 TE E2 : T # ES 2 TE E3 : T # ES 3 TE (if E1 E2 E3 ) : T # ES 1 ⊕ ES 2 ⊕ ES 3
[if ]
Thai seeks your help in fleshing out other parts of his type/exception system: b. Give the type/exception rules for abs, procedure application, raise, and handle. c. Give a type/exception derivation for the following expression, which should be well typed according to your rules: (abs (n m) (handle e (abs (a) (not a)) (let ((f (abs (x) (if (prim < x 0) (raise e x) (prim + x n))))) (prim < 0 (if (handle e (abs (y) (prim > y n)) (prim = m (f n))) (handle e (abs (z) (if (prim = z n) (raise e #f) z)) (prim * 2 (f m))) (handle e (raise e #t) (raise e (raise e (sym beardly)))))))))
d. Describe how to extend Thai’s type/exception system so that it tracks errors generated by the error construct as well as exceptions. e. Discuss the technical challenges that need to be addressed in order to modify Algorithm Z to automatically reconstruct types and exception specifications for Thai’s system (FLARE+{raise, handle}). For simplicity, ignore all store effects and focus solely on tracking exception specifications. What do constraints on exception specifications look like? Can they always be solved?
16.3.4
Execution Cost Analysis
It is sometimes helpful to have an estimate for the cost of evaluating an expression. A cost might measure abstract units of time or other resources (e.g., memory space, database accesses, network bandwidth) required to evaluate the expression. An effect system can estimate the cost of evaluating an expression by (1) associating a cost effect with each expression and (2) extending procedure types to have a latent cost effect that is accounted for every time a procedure is called. Exercise 16.22 explores a simple cost system based on this idea. For practical cost systems, it must be possible to express costs that depend on the size of data structures (e.g., the length of a list or dimensions of a matrix) [RG94]. Cost systems can be helpful for parallel scheduling; two noninterfering expressions should be scheduled for parallel execution only if their execution times
16.3.4 Execution Cost Analysis
989
are large enough to outweigh the overheads of the mechanism for parallelism. Cost systems also provide a simple way to conservatively determine which expressions must terminate and which might not. Cost systems can even be used to approximate the complexity of an algorithm [DJG92]. Exercise 16.22 In order to estimate the running time of FLARE programs, Sam Antics wants to develop a set of static rules that assign every expression a cost as well as a type. The cost of an expression is a conservative estimate of how long the expression will take to evaluate. Sam develops a type/cost system for Discount, a variant of FLARE in which procedure types carry latent cost information: ∗ T ∈ Type ::= . . . FLARE types except for -> . . . | (-> (Targ ) Clat Tres )
C ∈ Cost ::= NT | loop | (sumc C ∗ ) | (maxc C ∗ ) NT ∈ NatLit ::= 0 | 1 | 2 | . . .
For example, the Discount type (-> (int int) 5 bool) is the type of a procedure that takes two integers, returns a boolean result, and costs at most 5 abstract time units every time it is called. Sam formulates a cost analysis in Discount via type/cost judgments of the form TE E : T $ C
which means “in type environment TE , expression E has type T and cost C .” For example, here are Sam’s type/cost rules for integers and (nongeneric) variable references: TE N : int $ 1
[int]
TE I : TE (I ) $ 1
[var]
That is, Sam assigns both integers and variable references a cost of 1 abstract time unit. In addition, Sam specifies the following costs for some other Discount expressions: • The cost of an abs expression is 2. • The cost of an if expression is 1 more than the cost of the predicate expression plus the maximum of the costs of the two branch expressions. • The cost of an n-argument procedure application is the sum of the cost of the operator expression, the cost of each operand expression, the latent cost of the operator, and n. • The cost of an n-argument primitive application is the sum of the cost of each operand expression, the latent cost of the primitive operator (as specified in the primitive type environment TE prim ), and n. Here are some example types of primitive operators: TE prim (+) = (-> (int int) 1 int) TE prim (>) = (-> (int int) 1 bool)
990
Chapter 16
Effects Describe Program Behavior
Here are some example judgments that hold in Sam’s system: {a : int} (prim + a 7) : int $ 5 {a : int, b : int} (prim > (prim + a 7) b) : bool $ 9 {a : int} (abs (x) (prim > x a)) : (-> (int) 5 bool) $ 2 {a : int, gt : (-> (int) 5 bool)} (gt 17) : bool $ 8 {a : int, b : int, gt : (-> (int) 5 bool)} (if (gt b) (prim + b 1) 0) : int $ 14
The abstract cost loop is assigned to expressions that may diverge. For example, the expression Ehang = (letrec ((hang (abs () (hang)))) (hang))
is assigned cost loop in Discount. Because it is undecidable whether an arbitrary expression will diverge, it is impossible to have a type/cost system in which exactly the diverging expressions have cost loop. So Sam settles for a system that makes a conservative approximation: every program that diverges will be assigned cost loop, but some programs that do not diverge will also be assigned loop. The cost constructs (sumc C1 . . . Cn ) and (maxc C1 . . . Cn ) are used for denoting, respectively, the sum and maximum of the costs C1 . . . Cn , which may include nonnumeric costs like loop and cost identifiers (see part d). Sam’s system ensures that sumc and maxc satisfy sensible cost-equivalence axioms, such as: (sumc NT 1 NT 2 ) ≈c NT 3 , where N [[NT 3 ]] = N [[NT 1 ]] +Nat N [[NT 2 ]] (sumc loop NT ) ≈c (sumc NT loop) ≈c (sumc loop loop) ≈c loop (maxc NT 1 NT 2 ) ≈c NT 3 , where N [[NT 3 ]] = (max N [[NT 1 ]] N [[NT 2 ]]) (maxc loop NT ) ≈c (maxc NT loop) ≈c (maxc loop loop) ≈c loop
In Sam’s system, such cost equivalences can implicitly be used wherever costs are mentioned. a. Give type/cost rules for abs, procedure application, primitive application, and if. b. Sam wants the following Discount expression to be well typed: Eif = (if b (abs (x) (prim + x x)) (abs (y) (prim + (prim + y y) y)))
But the types of the two branches, (-> (int) 5 int) and (-> (int) 9 int), are procedure types differing in their latent costs, which causes this expression to be ill typed. To fix this problem, define (1) a sensible cost-comparison relation ≤c (2) a notion of subtyping in Discount and (3) a type/cost inclusion rule for Discount (a variant of the [inclusion] rule in Figure 12.1 on page 703). Show that Eif is well typed with your extensions.
16.3.5 Storage Deallocation and Lifetime Analysis
991
c. Define type/cost rules for monomorphic versions of let and letrec. Show why Ehang must be assigned cost loop using your rules. d. Define type/cost rules for polymorphic versions of let and letrec and a rule for referencing a variable whose type is a generic type schema. You may assume that the Cost domain is extended to include description variables δ ∈ DescId that can stand for costs. Using your rules, give a type/cost derivation showing that the following expression is well typed: (let ((app5 (if (app5 (app5 (app5
(abs (abs (abs (abs
(f) (x) (y) (z)
(f 5)))) (prim > x 0))) y)) (prim + z 1)))))
e. Discuss the technical challenges that need to be addressed in order to modify Algorithm Z to automatically reconstruct types and costs for Discount. For simplicity, ignore all store effects and focus solely on calculating costs. What do cost constraints look like? Can they always be solved? f. In Sam’s Discount type/cost system, every recursive procedure has latent cost loop. Since Discount uses recursion to express iteration, all iterations are conservatively assigned the infinite cost loop. While this is sound, it is not very useful. For example, it would be nice for an iteration summing the integers from 1 to n to have a finite cost that depends on n. Design a simple iteration construct that would allow assigning finite costs to some iterations, and discuss the technical issues that arise in the context of your construct.
16.3.5
Storage Deallocation and Lifetime Analysis
In implementations of languages like FLARE/E, it is often difficult to determine statically when a cell can no longer be referenced. For this reason, cells are typically allocated in a dynamically managed storage area called the heap, where they are reclaimed dynamically by a garbage collector (see Chapter 18). However, an effect system with regions enables a framework for the static allocation and deallocation of memory. The following expression illustrates the key idea: (let ((my-cell (cell 1)) {assume this cell is in region rm} (your-cell (cell 2))) {assume this cell is in region ry} (pair (^ my-cell) (abs () (^ your-cell))))
The region rm for my-cell is completely local to the let expression and can be deleted from the effect of the let expression. This means that the allocation and all uses of my-cell occur within the let expression, so my-cell may be deallocated when the let expression is exited. In contrast, the region ry for your-cell appears in the type (pairof int (-> () (read ry) int)) of the
992
Chapter 16
Effects Describe Program Behavior
TE E : T ! F [letregion] TE (letregion R E ) : T ! F where F[[F ]] = {BF | ((BF ∈ F[[F ]]) ∧ (R ∈ FrDescIds eff [[BF ]]))} [export restriction] R ∈ FrDescIds ty [[T ]] ∀I ∈ FrIds[[E ]] . (R ∈ FrDescIds ty [[TE (I )]]) [import restriction] [control restriction] ∀BF ∈ F[[F ]] . (BF = (comefrom R )) Figure 16.15
The type/effect rule for region-based storage management.
let expression, indicating that your-cell must outlive the let expression. But if ry does not “escape” some enclosing expression E , it may be deallocated when E is exited. Region-based static storage management can be formalized by extending the expressions of FLARE/E with a binding construct (letregion R E ) that declares the region named R in the scope of the body expression E . In the dynamic semantics, this construct creates a new segment of memory named R in which cells may be allocated, evaluates E to a value V , and then deallocates the entire segment R before returning V . So memory is organized as a stack of segments such that entering letregion pushes a new segment onto the stack and exiting letregion pops its segment off the stack. We also replace the cell primitive by the kernel construct (cell E R), in which the region name R explicitly indicates in which segment the cell should be allocated. We assume that letregion is used only to declare cell regions and that other regions, such as regions representing control points in control effects, are handled as before. Using the region name R in the cell construct is sound only if (1) it is in the lexical scope of a letregion expression declaring R and (2) the cell cannot outlive (i.e., escape from the scope of) the letregion expression declaring R. Condition 1 can be expressed by requiring that a program body contain no free cell region names — i.e., all cell regions mentioned in the program body must be bound by an enclosing letregion. Condition 2 is expressed by the [letregion] type/effect rule in Figure 16.15. This is a specialized version of the [effect-masking] rule in Figure 16.10 on page 973 guaranteeing that it is safe to deallocate the memory segment created by (letregion R E ) once the evaluation of E is complete. The effect F of (letregion R E ) contains all base effects in the effect F of E except for those that mention R. As in the [effect-masking] rule, the export and import restrictions of the [letregion] rule guarantee that R is only used locally and may safely be excluded from F . In a system without control transfers, the export and import restrictions are enough to justify that it is safe to deallocate the memory segment named by R, since no cell allocated in R can be referenced again upon termination of the
16.3.5 Storage Deallocation and Lifetime Analysis
993
letregion expression. However, in the presence of control effects, an additional control restriction is necessary to guarantee that the rest of the program can never jump back into the letregion expression. If such a jump were to occur, the memory segment associated with R might be accessed after the termination of the letregion expression, and so deallocation of this segment would be unsafe. This possibility can be precluded by requiring that the letregion expression not have a comefrom effect, and thus cannot be the target of any control transfers. The above cell example can be transformed to use letregion as follows: (letregion rm (let ((my-cell (cell 1 rm)) (your-cell (cell 2 ry))) {ry is free here but is presumably} {bound by an enclosing letregion.} (pair (^ my-cell) (abs () (^ your-cell)))))
This expression is well typed, so it is safe to deallocate the region rm containing my-cell upon exiting (letregion rm . . . ). Although only one cell is allocated in a region in this example, in general arbitrarily many cells may be allocated in a single region. But an attempt to allocate your-cell in rm in this example would make the letregion expression ill typed because the export restriction would be violated. It is necessary to allocate your-cell in a separate region ry that is declared by some other letregion expression syntactically enclosing this one. In the worst case, ry might be declared by a top-level letregion that wraps the entire program body. We have focused on the region-based storage management of cells, but any type of value — e.g., pairs, lists, procedures, and even integers and booleans — can be associated with regions of memory. In FLARE/E, all these values are immutable and so they have no effects observable by the programmer. However, even immutable values must be stored somewhere, and regions are useful for managing the storage of such values. In this context, effects and regions can be used to perform a static lifetime analysis that determines where in the program a value created at one point can still be “alive.”8 This is necessary for determining when the storage associated with the value can be deallocated. The lifetime analysis of immutable values is explored in Exercise 16.24. A practical region-based storage management system requires a way to automatically determine the placement of letregion declarations and annotate cell expressions with region information while maintaining the well-typedness of a program. A crude approach is to wrap all letregions around the program body, but a more useful (and challenging!) goal is to make the scope of every letregion 8
A closely related analysis is an escape analysis that determines which values can escape the scope in which they were declared.
994
Chapter 16
Effects Describe Program Behavior
as small as possible. Procedures that can be polymorphic in regions are helpful for shrinking the scope of letregions; see Exercise 16.23. One such region-based storage management system has been designed and implemented by Tofte and Talpin [TT97]. They developed and proved correct an algorithm for translating an implicitly typed functional language into a language with explicit letregion expressions, region annotations, and region-polymorphic procedures. Their system handles integers, procedures, and immutable pairs, all of which are allocated in regions, but it can easily be extended to mutable data as well. Exercise 16.23 a. The following is a FLARE/E program in which the two cell expressions have been annotated with explicit regions. Add explicit letregion expressions declaring r1 and r2 so that (1) the resulting expression is well typed and (2) the scope of each letregion expression is as small as possible: (flarek (a b) (let ((f (abs (x) (let ((p (cell (prim - x 1) r1)) (q (cell (prim + x 1) r2))) (prim pair (prim ^ p) q))))) (let ((s (prim fst (f a))) (t (prim snd (f b)))) (prim + s (prim ^ t)))))
b. Sketch an algorithm for adding letregion declarations and explicit cell regions to a well-typed FLARE/E expression E so that (1) the resulting expression E is well typed and (2) the scope of each letregion expression in E is as small as possible. You may assume that you are given the complete type derivation for E . c. Polly Morwicz observes that tighter letregion scopes can often be obtained if some procedures are region-polymorphic. For example, using the pabs and pcall constructs from Figure 12.9 on page 731, she modifies the procedure f in the program from part a to abstract over region r2: (flarek (a b) (let ((f (pabs (r2) (abs (x) (let ((p (cell (prim - x 1) r1)) (q (cell (prim + x 1) r2))) (prim pair (prim ^ p) q)))))) (let ((s (prim fst ((pcall f r3) a))) (t (prim snd ((pcall f r4) b)))) (prim + s (prim ^ t)))))
Add explicit letregion expressions to Polly’s modified expression, striving to make all letregion scopes as small as possible.
16.3.6 Control Flow Analysis
995
Exercise 16.24 Thai Ping wants to use regions and effects to perform lifetime analysis and storage management for pairs and other immutable values in FLARE/E. He begins by modifying the type grammar of FLARE/E to extend a pairof type to include the region where it is stored: T ∈ Type ::= . . . all types except (pairof T T ) . . . | (pairof T T R)
He also extends the effect grammar to include a new access effect constructor: FCR ∈ EffectConstructor = . . . ∪ {access}
Thai explains that the effect (access R) is a lifetime effect used both for allocating an immutable pair value in region R and for extracting its components. a. Write the type schemas for pair and fst in the primitive type environment used by the FLARE/E implicit type/effect system. b. Explain how access effects and the [letregion] rule can be used to aggressively deallocate pair p in the following expression: (let ((g (abs (a b) (let ((p (prim pair a b))) (prim pair (prim snd p) (prim fst p)))))) (g 1 2))
The FLARE/E pair primitive does not take explicit regions, but you may assume that the scope of the region R declared by (letregion R E ) includes any pairof types and access effects that appear in the type derivation of E . c. access effects are used for the lifetime analysis of immutable values and should not affect the purity of expressions. For example, the expressions (prim pair 1 2) and (prim fst p) both have effects of the form (access R), but should still be considered pure since they do not have store effects or control effects that could cause them to interfere with other expressions. Describe how to modify the FLARE/E notion of purity to handle lifetime effects. d. FLARE/E lists and procedures can also be modified to support a region-based lifetime analysis similar to the one Thai developed for pairs. Describe all the changes that need to be made to the FLARE/E syntax and type rules to accomplish this.
16.3.6
Control Flow Analysis
In function-oriented languages, a control flow analysis tracks the flow of higherorder procedures in a program.9 Each abstraction in a program can be annotated with a distinct label, just as each cell expression can be associated with a region name. Then every procedure type can be annotated with the set of labels 9
Although traditionally used to track the flow of procedure values, the same analysis can easily be extended to track the flow of any kind of value.
996
Chapter 16
Effects Describe Program Behavior
describing the abstractions that could be the source of that type. Although these labels are not effects, such an analysis can be accomplished using the machinery of an effect system. Consider the following FLARE expression, in which each abstraction has been annotated with an explicit integer label: (let ((inc (abs 1 (x) (+ x 1))) (dbl (abs 2 (y) (* y 2))) (app3 (abs 3 (f) (f 3))) (app4 (abs 4 (g) (g 4)))) (list (app3 inc) (app4 inc) (app4 dbl)))
The annotated type of inc would be (-> (int) {1} int) and that of dbl would be (-> (int) {2} int). A type/label system can determine that the argument g to app4 has type (-> (int) {1, 2} int) (because it might be either the inc or dbl procedure) while the argument f to app3 has type (-> (int) {1} int) (because it can only be the inc procedure). Knowing which procedures reach which call sites can guide program optimizations. For example, if only one procedure reaches a call site, the call can be replaced by an inlined version of the procedure’s body. Information from a control flow analysis is particularly important for choosing procedure representations in a compiler (see Section 17.10.2). A control flow analysis is simpler than the lifetime analysis discussed in Section 16.3.5. In lifetime analysis, the latent effect in a procedure type describes all values that might be referenced when the procedure is called (see Exercise 16.24). In a control flow analysis, the annotation on a procedure type just describes which source abstractions might flow to the expression with that type. Consult [NNH98] for an extensive discussion of control flow analysis and how it can be expressed in an effect system.
16.3.7
Concurrent Behavior
Thus far we have studied only sequential programs, in which execution can be visualized as the progress of a single control token that moves through the program, performing each operation it encounters along the way. The path taken by the control token is known as a control thread. This single thread of control can be viewed as a time line along which all operations performed by the computation are arranged in a total order. For example, a computation that sequentially performs the operations A, B, C, and D can be depicted as the following total order, where X → Y means that X is performed before Y : →A→B→C→D→
16.3.7 Concurrent Behavior
997
In a concurrent program, multiple control threads may be active at the same time, allowing the time relationship between operations to be a partial order rather than a total order. Here is a sample partial order that declares that A precedes B along one control thread and C precedes D along another control thread, but does not otherwise constrain the operation order: A→B → fork
@
@
join →
C→D
The diagram introduces two new nodes labeled fork and join. The purpose of these nodes is to split and merge control threads so that a computation has a distinguished starting edge and a distinguished ending edge. A control token reaching a fork node splits into two subtokens on the output edges of the node. When tokens are on both input edges of a join node, they merge into a single token on the output node. If only one input edge to a join has a token, it cannot move forward until the other edge contains a token. Any node like join that forces one control token to wait for another is said to synchronize them. There are many linguistic mechansisms for specifying concurrency and synchronization, some of which are described in the Web Supplement to this book. Suppose that on any step of a multithreaded computation, only one control token is allowed to move.10 Then a particular execution of a concurrent program is associated with the sequence of its observable actions, which we shall call an interleaving. The behavior of the concurrent program is the set of all possible interleavings that can be exhibited by the program. For example, assuming that all operations (except for fork and join) are observable, then the behavior of the branching diagram above is: {ABCD, ACBD, ACDB, CABD, CADB, CDAB}
The behavior of a concurrent program may be the empty set (no interleavings are possible), a singleton set (exactly one interleaving is possible), or a set with more than one element (many interleavings are possible). A concurrent program with more than one interleaving exhibits nondeterministic behavior. Although sequential programs can exhibit nondeterminism,11 nondeterminism is most commonly associated with concurrent programs. 10 There are concurrent models in which multiple control tokens can move in a single step, but we shall not consider these. 11 For example, a purely sequential language can exhibit nondeterminism if operand expressions in procedure applications may be evaluated in any order or if it supports an (either E1 E2 ) construct that returns the value of one of E1 or E2 .
998
Chapter 16
Effects Describe Program Behavior
In some models of concurrency, concurrently executing threads can communicate by having one thread send a value to another thread over a channel to which they share access. Communication establishes a timing constraint between the threads: a value sent over a channel cannot be received by the receiving thread until it has been sent by the sending thread. We can extend FLARE/E to be a channel-based concurrent language by adding the following four constructs: (channel) : Create and return a new channel. (send! Echan Eval ) : First evaluate Echan to the channel value Vchan , then evaluate Eval to the value Vval , and then send Vval over the channel Vchan . It is an error if Vchan is not a channel. (receive! Echan ) : Evaluate Echan to the channel value Vchan and then return the next value received from the channel Vchan . It is an error if Vchan is not a channel. (cobegin E1 . . . En ) : Evaluate each of E1 . . . En in a separate thread and return the value of En . For example, here is a procedure that uses three channels to communicate between three threads: Econcabs = (abs (x) (let ((a (channel)) (b (channel)) (c (channel))) (cobegin (send! c (+ 1 (receive! a))) (send! c (* 2 (receive! b))) (begin (send! a (- x 3)) (send! b (/ x 4)) (+ (receive! c) (receive! c))))))
Since + is commutative, the order in which the values are received from channel c by the third thread does not affect the value returned by the procedure. But the returned value would depend on the order if the + were replaced by a noncommutative operator like -. An effect system can be used to analyze the communication behavior of a channel-based concurrent program. If we interpret a region R as denoting an abstract channel, then we can model sending a value over channel R with an effect (out R) and model the receipt of a value from this channel with an effect (in R). In a simple communication-effect system (such as the one described in [JG89b]), in and out effects can be tracked just like the store and control effects
16.3.8 Mobile Code Security
999
studied earlier. Such a system can determine that an expression communicates on certain channels, but the ACUI nature of the maxeff effect combiner makes it impossible to determine any ordering on these communications. E.g., if channels a, b, and c in the above example are in regions ra, rb, and rc, respectively, then the body of Econcabs has the effect (maxeff (in ra) (in rb) (in rc) (out ra) (out rb) (out rc))
which does not indicate the relative order of the communication actions or the number of times they are performed. However, the information is sufficient to show that the communication effects are completely local to the procedure body and so can be deleted by effect masking. In more sophisticated communication-effect systems (such as the one described in [ANN97]), the ordering of communication effects is modeled by specifying the sequential and parallel composition of effects. For example, in such a system, the effect of the cobegin expression in Econcabs might be: (par (seq (in ra) (out rc)) (seq (in rb) (out rc)) (seq (out ra) (out rb) (in rc) (in rc)))
where seq is used to combine effects for sequential execution and par is used to combine effects for parallel execution. This shows the ordering of channel operations in each thread and the fact that the third thread receives two values from the channel in region rc. Such a specification resembles the kinds of specifications used in process algebra frameworks like Communicating Sequential Processes (CSP) [Hoa85] and the Calculus of Communicating Systems (CCS) [Mil89].
16.3.8
Mobile Code Security
In modern computer systems, it is often desirable for applications on a local computer to automatically download and execute mobile code from remote Internet sites. But this is a dangerous prospect, since executing arbitrary mobile code might destroy or steal local information or use the local computer’s resources for nefarious purposes like sending spam email, attacking Web servers, or spreading viruses. One application of effects is to provide mobile code security by labeling primitive operations with latent effects that describe their actions. For example, all procedures that write on the local computer’s disk could carry a write-disk latent effect. Other latent effects could be assigned to display and networking procedures. These effects create a verifiable, succinct summary of the actions of
1000
Chapter 16
Effects Describe Program Behavior
imported mobile code. These effects can be presented to a security checker — which might involve a user dialogue box — that accepts or rejects mobile code on the basis of its effects. Since mobile code is downloaded and executed on the fly, any security analysis performed by the local computer must be relatively quick in order to be practical. Although some properties can efficiently be deduced by analyzing the downloaded code from scratch, other important properties are too expensive for the local computer to reconstruct. For example, for arbitrary low-level code, it is difficult to prove memory safety properties like the following: (1) no variable is accessed until it is initialized; (2) no out-of-bounds access is made to an array; and (3) there is no dereference of a pointer to a deallocated memory block.12 This problem can be addressed by requiring the code producer to include explicit type and effect annotations in the mobile code that are sufficient to allow the code consumer to rapidly verify security properties. For example, the types might encode a proof that no array access is out of bounds, and a simple type-checking procedure by the code consumer could verify this proof. Generating the appropriate annotations might be expensive for the producer, but the consumer can use type and effect rules to quickly verify that the annotations are valid. This is an example of a technique called proof-carrying code [NL98, AF00], in which mobile code carries a representation of proofs of various properties in addition to the executable code. It is used for properties that are difficult for the consumer to determine from raw low-level code, but are easy for the consumer to verify if the producer of the low-level code (which presumably has access to more information, in the form of the high-level source program) provides a proof.
Notes Effect systems were introduced by Lucassen and Gifford in [Luc87, LG88], which outlined the need for a new kind of static analysis for describing program behavior. Early experiments with the design and use of effect systems were performed in the context of the FX-87 programming language [GJLS87], an explicitly typed language including effects and regions. Later versions of FX incorporated region and effect inference [JG91]. Effects were used to guide standard compiler optimizations (e.g., common subexpression elimination, dead code elimination, and code hoisting) as well as to find opportunities for parallel evaluation [Luc87, HG88]. We explore effect-based code optimization in Section 17.6.2. 12
See Chapter 18 for a discussion of memory allocation and deallocation.
Notes for Chapter 16
1001
The first polymorphic type/effect reconstruction system was presented in [JG91]. The improved reconstruction systems in [TJ92, TJ94a] guaranteed principal types and minimal effects. Our Algorithm Z incorporates two key features of the improved systems in a derivation-style reconstruction algorithm: It (1) allows subeffecting via the [does] rule to compute minimal effects and (2) requires the latent type of a procedure type to be a description variable, which simplifies the unification of procedure types and the solution of effect constraints. Without the second feature, it would be necessary to modify the unification algorithm to produce effect-equality constraints between the latent effects of two unified procedure types and to extend the effect-constraint solver to handle such equality constraints. A wide variety of effect systems have been developed, including systems for cost accounting [DJG92, RG94, CW00], control effects [JG89a], and communication effects [JG89b]. The FX-91 programming language [GJSO92] included all of these features. Other examples of effect systems include control flow analysis [TJ94b], region-based memory management [TT97], behavior analysis for concurrency [ANN97], atomicity effects for concurrency, [FQ03], register usage analysis [Aga97, AM03], and trace effects for verifying program safety properties [SSH08]. As noted in Section 16.3.3, Java has a simple effect system for tracking exceptions that can be thrown by a method [GJS96]. Monadic systems for expressing state can be extended with an effect system [Wad98]. For a detailed introduction to effect systems and a summary of work done in this area, see [TJ94a], [NNH98, Chapter 5], and [ANN99].
Part IV
Pragmatics
17 Compilation Bless thee, Bottom! bless thee! thou art translated. — William Shakespeare, A Midsummer Night’s Dream, act 2, scene 1
17.1
Why Do We Study Compilation?
Compilation is the process of translating a high-level program into instructions that can be directly executed by a low-level machine, such as a microprocessor or a simple virtual machine. Our goal in this chapter is to use compilation to further our understanding of advanced programming language features, including the practical implications of language design choices. To be a good designer or user of programming languages, one must know not only how a computer carries out the instructions of a program (including how data are represented) but also the techniques by which a high-level program is converted into something that runs on an actual computer. In this chapter, we will show the relationship between the semantic tools developed earlier in the book and the practice of translating high-level language features to executable code. Our approach to compilation is different from the approach taken in most compiler texts. We assume that the input program has already been parsed and is syntactically correct, thus ignoring issues of lexical analysis and parsing that are important in real compilers. We also assume that type and effect checking are performed by the reconstruction techniques we have already studied. Our focus will be a series of source-to-source program transformations that implement complex high-level naming, state, and control features by making them explicit in an FL-like intermediate compilation language. A key benefit of our approach is that it dispenses with traditional special-purpose compilation machinery like symbol tables, invocation frames, stacks, and basic blocks. These notions are uniformly represented as patterns in the structure of the intermediate code. The result of compilation will be a program in a restricted subset of the intermediate
1006
Chapter 17
Compilation
language that can be viewed as instructions for a simple virtual register machine. In this way we avoid details of code generation that are important when targeting a real microprocessor. Throughout the compilation process, efficiency will take a back seat to clarity, modularity, expressiveness, and demonstrable correctness. The notion of compilation by source-to-source transformation has a rich history. Beginning with Guy Steele’s Rabbit compiler ([Ste78]), there is a long line of research compilers based on this approach. (See the notes at the end of this chapter for more details.) In homage to Rabbit, we will call our compiler Tortoise. We study compilation for the following reasons: • We can review many of the language features presented earlier in this book in a new light. By showing how programs can be transformed into low-level machine code, we arrive at a more concrete understanding of these features. • We present some simple ways to implement language features by translation. These techniques can be useful in everyday programming, especially if your programming language doesn’t support the features that you need. • We will see how complex translations can be composed out of many simple passes. Although in practice these passes might be merged, we will discuss them separately for conceptual clarity. • We will see that the inefficiencies that crop up in the compiler are a good motivation for studying static semantics. These inefficiencies can be addressed by a combination of two methods: • Developing smarter translation techniques that exploit information known at compile time. • Restricting source languages to make them more amenable to static analysis techniques. For example, we’ll see (in Section 18.2.2) that dynamically typed languages imply a run-time overhead that can be reduced by clever techniques or eliminated by requiring the language to be statically typable. We begin with an overview of the transformation-based architecture of Tortoise (Section 17.2). We then discuss the details of each transformation in turn (Sections 17.3–17.12).
17.2 Tortoise Architecture
17.2
Tortoise Architecture
17.2.1
Overview of Tortoise
1007
The Tortoise compiler is organized into ten transformations that incrementally massage a source language program into code resembling register machine code (Figure 17.1). The input and output of each transformation are programs written either in dialects of FLARE or in dialects of an FL-like intermediate language named FIL that is defined later. The output of the compiler is a program in FILreg , a dialect of FIL whose constructs can be viewed as instructions for a low-level register machine. We review FLARE in this section and present the dialects of FIL later as they are needed. We will see that dialects of FL (including FLARE) can be powerful intermediate languages for compilation. Many low-level machine details find a surprisingly convenient expression in FL-like languages. Some advantages of structuring our compiler as a series of source-to-source transformations on dialects of FL are: • All the intermediate languages are closely related to FL, a language whose semantics we already understand well. • When intermediate languages are closely related, compiler writers are more likely to develop modular stages and experiment with their ordering. • The result of every transformation stage is executable source code in a dialect of FL. This facilitates reading and testing the transformation results using an interpreter (or compiler) for the dialect. Because the dialects are so similar, their interpreters are closely related. Indeed, modulo the verification of certain syntactic constraints, a single interpreter can be used for most of the dialects. Each compiler transformation expects its input program to satisfy certain preconditions and produces output code that satisfies certain postconditions. These conditions will be stated explicitly in the formal specification of each transformation. They will help us understand the purpose of each transformation, and why the compiler is sound. A compiler is sound when it produces low-level code that faithfully implements the formal semantics of the compiler’s source language. We will not formally prove the soundness of any of the transformations because such proofs can be very complex. Indeed, soundness proofs for some of these transformations have been the basis for Ph.D. dissertations! However, we will informally argue that the transformations are sound.
1008
Chapter 17
Compilation
FLARE/V Desugaring Globalization Assignment Conversion FLARE Type/Effect Reconstruction FLARE Translation FIL Renaming CPS Conversion FIL cps Closure Conversion Lifting FIL lift Register Allocation FILreg Figure 17.1 Organization of the Tortoise compiler. The initial transformations translate the FLARE/V source program to a FLARE program. This is translated into the FIL intermediate language and is then gradually transformed into a form that resembles register machine code.
17.2.2 The Compiler Source Language: FLARE/V
1009
Tortoise implements each transformation as a separate pass for clarity of presentation and to allow for experimentation. Although we will apply the transformations in a particular order in this chapter, other orders are possible. Our descriptions of the transformations will explore some alternative implementations and point out how different design choices affect the efficiency and semantics of the resulting code. We generally opt for simplicity over efficiency in our presentation.
17.2.2
The Compiler Source Language: FLARE/V
The source language of the Tortoise compiler is FLARE/V, a version of the FLARE language presented in Chapter 13 extended with mutable variables (using the set! construct from the FLAVAR language presented in Section 8.4). We include mutable variables in the source language because they are a standard feature in many languages and we wish to show how they can be automatically transformed into mutable cells (via the assignment conversion transformation in Section 17.5). FLARE/V is a stateful, call-by-value, statically scoped, function-oriented, and statically typed language with type reconstruction that supports mutable cells, mutable variables, pairs, and homogeneous immutable lists. For convenience, the complete syntax of FLARE/V is presented in Figures 17.2 and 17.3. This is the same as the presentation of FLARE in Figure 13.23 on page 814 except that (1) FLARE/V includes mutable variables via the set! construct and (2) the desugaring of a full-language program into a kernel program does not introduce bindings for standard identifiers like the names of primitive operations.1 All primitive names (such as *, >, and cons) may still be used as free identifiers in a FLARE/V program, where they denote global procedures performing the associated primitive operations, but this is implemented by the globalization transformation presented in Section 17.4 rather than via desugaring. As before, (prim * E1 E2 ) may be written as (* E1 E2 ) in almost any context. We say “almost any” because these names can be assigned and locally rebound like any other names. For example, the program (flare (x y) (let ((- +)) (begin (set! / *) (- (/ x x) (/ y y)))))
calculates the sum of the squares of x and y. 1
For simplicity, we reuse the program keywords flare and flarek for FLARE/V rather than introducing new ones.
1010
Chapter 17
Compilation
Kernel Grammar ∗ ) Ebody ) P ∈ Prog ::= (flarek (Iformal E ∈ Exp ::= | | |
L | I | (error Ymessage ) | (if Etest Ethen Eelse ) | ∗ ) (set! Ivar Eval ) | (prim Oprimop Earg ∗ ∗ (abs (Iformal ) Ebody ) | (Erator Erand ) (let ((Iname Edefn )∗ ) Ebody ) | (letrec ((Iname Edefn )∗ ) Ebody )
L ∈ Lit ::= #u | B | N | (sym Y ) B ∈ BoolLit = {#t, #f} as in FL. N ∈ IntLit = as in FL and FLARE. Y ∈ SymLit = as in FL and FLARE. O ∈ Primop ::= | | | | |
+ | - | * | / | % | >= | bool=? | sym=? not | and | or pair | fst | snd cons | car | cdr | null | null? cell | ^ | := | cell=?
; ; ; ; ; ;
arithmetic ops relational ops logical ops pair ops list ops mutable cell ops
Keyword = {abs, error, flarek, if, let, letrec, prim, set!, sym} SugarKeyword = {begin, cond, def, flare, list, recur, scand, scor}} I ∈ Ident = SymLit − ({Y | Y begins with @} ∪ Keyword ∪ SugarKeyword) Figure 17.2
Kernel grammar for the FLARE/V language.
Figure 17.4 presents a contrived but compact FLARE/V program that illustrates many features of the language, such as numbers, booleans, lists, locally defined recursive procedures, higher-order procedures, tail and nontail procedure calls (see Section 17.9.1 for a discussion of tail versus nontail calls), and mutable variables. We will use it as a running example throughout the rest of this chapter. The revmap procedure takes a procedure f and a list elts of elements and returns a new list that is the reversal of the list obtained by applying f to each element of elts. The accumulation of the new list ans is performed by a local iterative loop procedure that is defined using the recur sugar, which abbreviates the declaration and invocation of a recursive procedure. The loop procedure performs an iteration in a single state variable xs denoting the unprocessed elements of elts. Although ans could easily be made a second argument to loop, here it is defined externally to loop and updated via set! to illustrate the use of a mutable variable. The example program takes two integer arguments, a and b, and returns a list of the two booleans ((7 · a) > b) and (a > b). For example, on the inputs 6 and 17, the program returns the list true, false .
17.2.2 The Compiler Source Language: FLARE/V
1011
Syntactic Sugar (@Oprimop E ni=1 ) ds (prim Oprimop E ni=1 ) (cond (else Edefault )) ds Edefault (cond (Etest1 Ethen1 ) (Etesti Etheni )ni=2 (else Edefault )) ds (if Etest1 Ethen1 (cond (Etesti Etheni )ni=2 (else Edefault ))) (scand) ds #t ∗ ∗ ) ds (if Econjunct (scand Erest ) #f) (scand Econjunct Erest (scor) ds #f ∗ ∗ ) ds (if Edisjunct #t (scor Erest )) (scor Edisjunct Erest (recur Iproc ((Ii Ei )ni=1 ) Ebody ) n ) Ebody ))) (Iproc E ni=1 )) ds (letrec ((Iproc (abs (I i=1 (begin) ds #u (begin E ) ds E ∗ ∗ ) ds (let ((_ E1 )) (begin Erest )), (begin E1 Erest where _ is a special identifier that can never be referenced (list) ds (prim null) ∗ ∗ ) ds (prim cons E1 (list Erest )) (list E1 Erest ∗ (def (IprocName IprocFormal ) EprocBody ) ∗ ds (def IprocName (abs (IprocFormal ) EprocBody )) ∗ (flare (IpgmFormal ) EpgmBody (def Inamei Edefni )ni=1 ) {Assume procedure defs already desugared to (def I E ) by the previous rule.} ∗ ds (flarek (IpgmFormal ) {Compiler handles standard identifiers via globalization, not desugaring.} (letrec ((Inamei Edefni )ni=1 ) EpgmBody ))
Figure 17.3
Syntactic sugar for the FLARE/V language.
(flare (a b) (let ((revmap (abs (f elts) (let ((ans (null))) (recur loop ((xs elts)) (if (null? xs) ans (begin (set! ans (cons (f (car xs)) ans)) (loop (cdr xs))))))))) (revmap (abs (x) (> x b)) (list a (* a 7))))) Figure 17.4
revmap program.
1012
Chapter 17
Compilation
tf ∈ Transform FLARE /V = ExpFLARE /V → ExpFLARE /V mapsubFLARE /V : ExpFLARE /V → TransformFLARE /V → ExpFLARE /V mapsubFLARE /V [[L]] tf = L mapsubFLARE /V [[I ]] tf = I mapsubFLARE /V [[(error Ymsg )]] tf = (error Ymsg ) mapsubFLARE /V [[(if Etest Ethen Eelse )]] tf = (if (tf Etest ) (tf Ethen ) (tf Eelse )) mapsubFLARE /V [[(set! Ivar Eval )]] tf = (set! Ivar (tf Eval )) n n mapsubFLARE /V [[(abs (I i=1 ) Ebody )]] tf = (abs (I i=1 ) (tf Ebody ))
mapsubFLARE /V [[(Erator E ni=1 )]] tf = ((tf Erator ) (tf Ei )ni=1 ) mapsubFLARE /V [[(prim O E ni=1 )]] tf = (prim O (tf Ei )ni=1 ) mapsubFLARE /V [[(let ((Ii Ei )ni=1 ) Ebody )]] tf = (let ((Ii (tf Ei ))ni=1 ) (tf Ebody )) mapsubFLARE /V [[(letrec ((Ii Ei )ni=1 ) Ebody )]] tf = (letrec ((Ii (tf Ei ))ni=1 ) (tf Ebody )) Figure 17.5 The mapsubFLARE /V function simplifies the specification of purely structural transformations.
17.2.3
Purely Structural Transformations
Most of the FLARE/V and FIL program transformations that we shall study can be described by functions that traverse the abstract syntax tree of the program and transform some of the tree nodes but leave most of the nodes unchanged. We will say that a transformation is purely structural for a given kind of tree node if the result of applying it to that node results in the same kind of node, in which each child node is a transformed version of the corresponding child of the original node. We formalize this notion for FLARE/V via the mapsubFLARE /V function defined in Figure 17.5. This function returns a copy of the given FLARE expression whose immediate subexpressions have been transformed by a given transformation tf . A FLARE transformation is purely structural for a given kind of node if its action on that node can be written as an application of mapsubFLARE /V . As an example of mapsubFLARE /V , consider a transformation T that rewrites every occurrence of (if (prim not E1 ) E2 E3 ) to (if E1 E3 E2 ). The fact that T is purely structural on all but if nodes is expressed via a single invocation of mapsubFLARE /V in the following definition:
17.3 Transformation 1: Desugaring
1013
subexpsFLARE /V : ExpFLARE /V → Exp∗FLARE /V subexpsFLARE /V [[L]] = [ ] subexpsFLARE /V [[I ]] = [ ] subexpsFLARE /V [[(error Ymsg )]] = [ ] subexpsFLARE /V [[(if Etest Ethen Eelse )]] = [Etest , Ethen , Eelse ] subexpsFLARE /V [[(set! Ivar Eval )]] = [Eval ] n subexpsFLARE /V [[(abs (I i=1 ) Ebody )]] = [Ebody ]
subexpsFLARE /V [[(Erator E ni=1 )]] = [Erator , E1 , . . . , En ] subexpsFLARE /V [[(prim O E ni=1 )]] = [E1 , . . . , En ] subexpsFLARE /V [[(let ((Ii Ei )ni=1 ) Ebody )]] = [E1 , . . . , En , Ebody ] subexpsFLARE /V [[(letrec ((Ii Ei )ni=1 ) Ebody )]] = [E1 , . . . , En , Ebody ] Figure 17.6 The subexpsFLARE /V function returns a sequence of all immediate subexpressions of a given FLARE/V expression. T : ExpFLARE /V → ExpFLARE /V T [[(if (prim not E1 ) E2 E3 )]] = (if (T [[E1 ]]) (T [[E3 ]]) (T [[E2 ]])) T [[E ]] = mapsubFLARE /V [[E ]] T , for all other expressions E
When manipulating expressions, it is sometimes helpful to extract from an expression a collection of its immediate subexpressions. Figure 17.6 defines a subexpsFLARE /V function that returns a sequence of all child expressions of a given FLARE/V expression.
17.3
Transformation 1: Desugaring
The first pass of the Tortoise compiler performs desugaring, converting the convenient syntax of FLARE/V into a simpler kernel subset of the language. The advantage of having the first transformation desugar the program is that subsequent analyses and transforms are simpler to write and prove correct because there are fewer syntactic forms to consider. Additionally, subsequent transformations also do not require modification if the language is extended by introducing new syntactic shorthands. We will provide preconditions and postconditions for each of the Tortoise transformations. In the case of desugaring, these are:
1014
Chapter 17
Compilation
Preconditions: The input to the desugaring transformation is a wellformed full FLARE/V program. Postconditions: The output of the desugaring transformation is a wellformed kernel FLARE/V program. We will say that a program is well formed in a language when it satisfies the grammar of the language — i.e., it does not contain any syntactic errors. There is an additional postcondition that we expect for desugaring (and all other transformations we study): The output program should have the same behavior as the input program. This is a fundamental property of each compilation stage that we will not explicitly state in every postcondition. One consequence of this property is that if the input program never encounters a dynamic type error, then neither does the output program. For dialects of FLARE, we can use a notion of well-typedness to conservatively approximate which programs never encounter a dynamic type error. (Although we have not formally described a type system for full FLARE/V, it is possible to define one by extending the type system of kernel FLARE with type rules for set! and all the syntactic sugar constructs.) We expect that Tortoise stages transforming programs in these dialects should preserve well-typedness. The desugaring process for FLARE/V is similar to the rewriting approach to desugaring summarized in Figures 6.6 and 6.7 on pages 232 and 233, so we will not repeat the details of the transformation process here. Figure 17.7 shows the result of desugaring the revmap example introduced in Figure 17.4. The (recur loop . . .) desugars into a letrec, the begin desugars into a let that binds the special variable _ (which we assume is never referenced), and the list desugars into a null-terminated nested sequence of conses.
17.4
Transformation 2: Globalization
In general, a program unit being compiled may contain free identifiers that reference externally defined values in standard libraries or other program units. Such free identifiers must somehow be resolved via a name resolution process before they are referenced during program execution. Depending on the nature of the free identifiers, name resolution can take place during compilation, during a linking phase that takes place after compilation but before execution (see Section 15.1), or during the execution of the program unit. In cases where name resolution takes place after compilation, the compiler may still require some information about the free identifiers, such as their types, even though their values may be unknown.
17.4 Transformation 2: Globalization
1015
(flarek (a b) (let ((revmap (abs (f elts) (let ((ans (null))) (letrec ((loop (abs (xs) (if (null? xs) ans (let ((_ (set! ans (cons (f (car xs)) ans)))) (loop (cdr xs))))))) (loop elts)))))) (revmap (abs (x) (> x b)) (prim cons a (prim cons (* a 7) (prim null)))))) Figure 17.7
revmap program after desugaring.
In the Tortoise compiler, we consider a very simple form of compile-time linking that resolves free references to standard identifiers like +, (T ni=1 ) Tres ) n or TE prim (O) = (generic (τ m j=1 ) (-> (T i=1 ) Tres ))
ABS[[I ]] = undefined, where I ∈ PrimopFLARE /V Figure 17.8
The wrapping approach to globalization.
wrapping strategy used here includes bindings for only the standard identifiers actually used in the program rather than all those that are supported by the language. For example, the wrapping strategy transforms the program (flarek (x y) (+ (* x x) (* y y)))
into (flarek (x y) (let ((+ (abs (v.0 v.1) (prim + v.0 v.1))) (* (abs (v.2 v.3) (prim * v.2 v.3)))) (+ (* x x) (* y y))))
We assume that identifiers ending in a period followed by a number (such as v.0 and v.1) are names that are freshly generated during the compilation process. Constructing an abstraction for a primitive operator (via ABS) requires knowing the number of arguments that it takes. In FLARE/V, this can be determined from the type of the primitive operator name in the primitive type environment, TE prim . ABS is a partial function because it is undefined for identifiers that are not the names of primitive operators. wrap is also a partial function because it is undefined if any invocation of ABS in its definition is undefined. Similarly, GW is undefined if the invocation of wrap in its definition is undefined; this is how the failure of the globalization transformation is modeled in the case where a free identifier in the program is not the name of a primitive operator. The wrapping strategy can be extended to handle standard identifiers that are not the names of primitive operators (see Exercise 17.2).
17.4 Transformation 2: Globalization
1017
The Inlining Strategy A drawback of the wrapping strategy is that global procedures are invoked via the generic procedure-calling mechanism rather than the mechanism for invoking primitive operators (prim). We will see in later stages of the compiler that the latter is handled far more efficiently than the former. This suggests an alternative approach in which calls to global procedures are transformed into primitive applications. Replacing a procedure call by an instantiated version of its body is known as inlining, so we shall call this the inlining strategy for globalization. Using the inlining strategy, the sum-of-squares program is transformed into: (flarek (x y) (prim + (prim * x x) (prim * y y)))
There are three situations that need to be handled carefully in the inlining strategy for globalization: 1. A reference to a global procedure can be converted to an instance of prim only if it occurs in the rator position of a procedure application. References in other positions must be handled either by wrapping or by converting them to abstractions. Consider the expression (cons + (cons * (null)))
which makes a list of two procedures. The occurrences of cons and null can be transformed into prims, but the + and * cannot be. They can, however, be turned into abstractions containing prims: (prim cons (abs (v.0 v.1) (prim + v.0 v.1)) (prim cons (abs (v.2 v.3) (prim * v.2 v.3)) (prim null)))
Alternatively, we can “lift” the abstractions for + and * to the top of the enclosing program and name them, as in the wrapping approach. 2. In languages like FLARE/V, where local identifiers may have the same name as global standard identifiers for primitive operators, care must be taken to distinguish references to global and local identifiers.2 For example, in the program (flare (x) (let ((+ *)) (- (+ 2 x) 3))), the invocation of + in (+ 2 x) cannot be inlined, but the invocation of - can be: (flare (x) (let ((+ (abs (v.0 v.1) (prim * v.0 v.1)))) (prim - (+ 2 x) 3))) 2
Many programming languages avoid this and related problems by treating primitive operator names as reserved keywords that may not be used as identifiers in declarations or assignments. This allows compiler writers to inline all primitives.
1018
Chapter 17
Compilation
3. In FLARE/V, the values associated with global primitive identifier names can be modified by set!. For example, consider (flarek (x y) (* (+ x (let ((_ (set! + -))) y)) (+ x y)))
in which the first occurrence of + denotes addition and the second occurrence denotes subtraction. It would clearly be incorrect to replace the second occurrence by an inlined addition primitive. Correctly inlining addition for the first occurrence and subtraction for the second occurrence is possible in this case, but can be justified only by a sophisticated effect analysis. A simple conservative way to address this problem in the inlining strategy is to use wrapping rather than inlining for any global name that is mutated somewhere in the program. For the above example, this yields: (flarek (x y) (let ((+ (abs (v.2 v.3) (prim + v.2 v.3)))) (prim * (+ x (let ((_ (set! + (abs (v.0 v.1) (prim - v.0 v.1))))) y)) (+ x y))))
All of the above issues are handled by the definition of the inlining approach to globalization in Figure 17.9. The GI prog function uses MutIds prog (defined in Figure 17.10) to determine the mutated free identifiers of a program — i.e., the free identifiers that are targets of assignments — and wraps the program body in abstractions for these. All other free identifiers should name primitives that may be inlined in call positions or expanded to abstractions (via ABS from Figure 17.8) in other positions. The identifier-set argument to GI exp keeps track of the unmutated free identifiers in the program that have not been locally redeclared. Again, the undefined cases of partial functions are used to model the situations in which globalization fails. Figure 17.11 shows our revmap example after the globalization stage using the inlining strategy. In this case, all references to free identifiers have been converted to primitive applications. In this and subsequent examples, we “resugar” primitive applications (prim O . . . ) to (@O . . . ) to make the code more concise. Exercise 17.1 What is the result of globalizing the following program using (1) the wrapping strategy and (2) the inlining strategy? (flare (* /) (+ (let ((+ *)) (- + 1)) (let ((* -)) (* / 2))))
17.5 Transformation 3: Assignment Conversion
1019
IS ∈ IdSet = P(IdentFLARE /V ) GI prog : ProgFLARE /V ProgFLARE /V n GI prog [[P ]] = (flarek (I i=1 ) (wrap[[GI exp [[Ebody ]] IS unmuts ]] IS muts )) n ) Ebody ), where P = (flarek (I i=1 IS muts = MutIds prog [[P ]], IS unmuts = (FrIds[[P ]]) − IS muts , wrap is defined in Figure 17.8, and MutIds prog is defined in Figure 17.10
GI exp : ExpFLARE /V → IdSet ExpFLARE /V GI exp [[(Irator E ni=1 )]] IS = if Irator ∈ IS then if Irator ∈ PrimopFLARE /V then (prim Irator (GI exp [[Ei ]] IS )ni=1 ) else undefined end else (Irator (GI exp [[Ei ]] IS )ni=1 ) end GI exp [[I ]] IS = if I ∈ IS then ABS[[I ]] else I end where ABS is defined in Figure 17.8 n n GI exp [[(abs (I i=1 ) Ebody )]] IS = (abs (I i=1 ) (GI exp [[Ebody ]] (IS − ∪ni=1 {Ii })))
GI exp [[(let ((Ii Ei )ni=1 ) Ebody )]] IS = (let ((Ii (GI exp [[Ei ]] IS ))ni=1 ) (GI exp [[Ebody ]] (IS − ∪ni=1 –Ii ˝))) n GI exp [[(letrec ((I i Ei )i=1 ) Ebody )]] IS = (letrec ((Ii GI exp [[Ei ]] IS )ni=1 ) GI exp [[Ebody ]] IS ) where IS = IS − ∪ni=1 {Ii }
GI exp [[E ]] IS = mapsubFLARE /V [[E ]] (λEsub . GI exp [[Esub ]] IS ), otherwise. Figure 17.9
The inlining approach to globalization.
Exercise 17.2 The globalization strategies described in this section assume that all standard identifiers name primitive procedures, but a standard library typically contains other kinds of entities. Describe how to extend globalization (both the wrapping and inlining strategies) to handle standard identifiers that name (1) literal values (e.g., true standing for #t) and (2) nonprimitive procedures (e.g., length and map from the FL standard library). Keep in mind that the nonprimitive procedures might be recursive or even mutually recursive.
17.5
Transformation 3: Assignment Conversion
Assignment conversion removes all mutable variables from a program by converting them to mutable cells. We will say that the resulting program is assignment-free because it contains no occurrences of the set! construct.
1020
Chapter 17
Compilation
MutIds prog : ProgFLARE /V → P(IdentFLARE /V ) n MutIds prog [[(flarek (I i=1 ) Ebody )]] = MutIds[[Ebody ]] − ∪ni=1 {Ii }
MutIds : ExpFLARE /V → P(IdentFLARE /V ) MutIds[[(set! I E )]] = {I } ∪ MutIds[[E ]] n MutIds[[(abs (I i=1 ) Ebody )]] = MutIds[[Ebody ]] − ∪ni=1 {Ii }
MutIds[[(let ((Ii Ei )ni=1 ) Ebody )]] = (∪ni=1 MutIds[[Ei ]]) ∪ (MutIds[[Ebody ]] − ∪ni=1 {Ii }) MutIds[[(letrec((Ii Ei )ni=1 ) Ebody )]] = ((∪ni=1 MutIds[[Ei ]]) ∪ MutIds[[Ebody ]]) − ∪ni=1 {Ii } MutIds[[E ]] = ∪E ∈subexps[[E ]] MutIds[[E ]], otherwise, (Since literals, variable references, and error expressions have no subexpressions, they have no mutated free identifiers.) Figure 17.10
Mutated free identifiers of FLARE/V expressions and programs.
Assignment conversion makes all mutable storage explicit and simplifies later passes by making all variable bindings immutable. After assignment conversion, all variables denote values rather than implicit cells containing values. A variable may be bound to an explicit cell value whose content varies with time, but the explicit cell value bound to the variable cannot change. As we will see later in the closure conversion stage (Section 17.10), assignment conversion is important because it allows environments to be treated as immutable data structures that can be freely shared and copied without concerns about side effects. In our compiler, assignment conversion precedes type and effect reconstruction because reconstruction is simpler in a language without mutable variables (FLARE) than one with them (FLARE/V). Additionally, in a language without mutable variables, all variable references are guaranteed to be pure, which enhances let-style polymorphism. A straightforward approach to assignment conversion is to make an explicit cell for every variable in a given program. For example, the factorial program (flarek (x) (let ((ans 1)) (letrec ((loop (abs (n) (if (@= n 0) ans (let ((_ (set! ans (@* n ans)))) (loop (@- n 1))))))) (loop x))))
17.5 Transformation 3: Assignment Conversion
1021
(flare (a b) (let ((revmap (abs (f elts) (let ((ans (@null))) (letrec ((loop (abs (xs) (if (@null? xs) ans (let ((_ (set! ans (@cons (f (@car xs)) ans)))) (loop (@cdr xs))))))) (loop elts)))))) (revmap (abs (x) (@> x b)) (@cons a (@cons (@* a 7) (@null)))))) Figure 17.11
revmap example after globalization using inlining.
can be assignment-converted to (flarek (x) (let ((x (@cell x))) (let ((ans (@cell 1))) (letrec ((loop (@cell (abs (n) (let ((n (@cell n))) (if (@= (@^ n) 0) (@^ ans) (let ((_ (@:= ans (@* (@^ n) (@^ ans))))) ((@^ loop) (@- (@^ n) 1))))))))) ((@^ loop) (@^ x))))))
In the converted program, each of the variables in the original program (x, ans, loop, n) is bound to an explicit cell. Each variable reference I in the original program is converted to a cell reference (@^ I ), and each variable assignment (set! I E ) in the original program is converted to a cell assignment of the form (@:= I E ), where E is the converted E . The code generated by the naive approach to assignment conversion can contain many unnecessary cell allocations, references, and assignments. A cleverer strategy is to make explicit cells only for those variables that are mutated in the program. Determining exactly which variables are mutated when a program executes is undecidable. We employ a simple conservative syntactic approximation that defines a variable to be mutated if it is assigned within its scope. In the
1022
Chapter 17
Compilation
factorial example, the alternative strategy yields the following program, in which only the ans variable is converted to a cell: (flarek (x) (let ((ans (@cell 1))) (letrec ((loop (abs (n) (if (@= n 0) (@^ ans) (let ((_ (@:= ans (@* n (@^ ans))))) (loop (@- n 1))))))) (loop x))))
The improved approach to assignment conversion is formalized in Figure 17.12. The AC prog function wraps the transformed body of a FLARE/V program in a let that binds each mutated program parameter (that is, each mutated free identifier in the body) to a cell. The free identifiers syntactically assigned within an expression are determined by the MutIds function defined in Figure 17.10. Expressions are transformed by the AC exp function, whose second argument is the set of in-scope identifiers naming variables that have been transformed to cells. Processing of variable references transforms such identifiers to cell references; variable assignments are transformed to cell assignments. The only other nontrivial cases for AC exp are the binding constructs abs, let, and letrec. All of these cases use the partition function to partition the identifiers declared by these constructs into two sets: the mutated identifiers IS M that are assigned somewhere in the given expressions, and the unmutated identifiers IS U that are not assigned. In each of these cases, any subexpression in the scope of the declared identifiers is processed by AC exp with an identifier set that includes IS M but excludes IS U . The exclusion is necessary to prevent the conversion of local unmutated variables that have the same name as external mutated variables. For example, (flarek (x) (let ((_ (set! x (@* x 2)))) ((abs (x) x) x))) is converted to (flarek (x) (let ((x (@cell x))) (let ((_ (@:= x (@* (@^ x) 2)))) ((abs (x) x) (@^ x))))) Even though the program parameter x is converted to a cell, the x in the abstraction body is not.
17.5 Transformation 3: Assignment Conversion
1023
IS ∈ IdSet = P(IdentFLARE /V ) AC prog : ProgFLARE /V → ProgFLARE Preconditions: The input to AC prog is a well-formed, closed, kernel FLARE/V program. Postconditions: The output of AC prog is a well-formed, closed, assignment-free, kernel FLARE program. n AC prog [[(flarek (I i=1 ) Ebody )]] n = (flarek (I i=1 ) (wrap-cells IS muts (AC exp [[Ebody ]] IS muts ))) where IS muts = MutIds[[Ebody ]] and MutIds is defined in Figure 17.10.
AC exp : ExpFLARE /V → IdSet → ExpFLARE AC exp [[I ]] IS = if I ∈ IS then (@^ I ) else I end AC exp [[(set! I E )]] IS = (@:= I (AC exp [[E ]] IS )) n AC exp [[(abs (I i=1 ) Ebody )]] IS = let IS M , IS U be (partition {I1 , . . . , In } [Ebody ]) n ) (wrap-cells IS M (AC exp [[Ebody ]] ((IS ∪ IS M ) − IS U )))) in (abs (I i=1
AC exp [[(let ((Ii Ei )ni=1 ) Ebody )]] IS = let IS M , IS U be (partition {I1 , . . . , In } [Ebody ]) in (let ((Ii (maybe-cell Ii IS M (AC exp [[Ei ]] IS )))ni=1 ) (AC exp [[Ebody ]] ((IS ∪ IS M ) − IS U ))) AC exp [[(letrec ((Ii Ei )ni=1 ) Ebody )]] IS , Ebody ]) = let IS M , IS U be (partition {I1 , . . . , In } [E1 , . . . , En in (letrec ((I IS [[E ]] IS maybe-cell I AC )ni=1 ) i i M exp i AC exp [[Ebody ]] IS ), where IS = ((IS ∪ IS M ) − IS U ) AC exp [[E ]] IS = mapsubFLARE /V [[E ]] (λEsub . AC exp [[Esub ]] IS ), otherwise. wrap-cells : IdSet → ExpFLARE → ExpFLARE wrap-cells {} E = E wrap-cells {I1 . . . In } E = (let ((Ii (@cell Ii ))ni=1 ) E ), where n ≥ 1. partition : IdSet → ExpFLARE /V * → (IdSet × IdSet) partition IS [E1 . . . En ] = let IS M be ∪ki=1 (MutIds[[Ei ]]) in IS ∩ IS M , IS − IS M maybe-cell : Ident → IdSet → ExpFLARE /V maybe-cell I IS E = if I ∈ IS then (@cell E ) else E end Figure 17.12 An assignment-conversion transformation that converts only those variables that are syntactically assigned in the program.
1024
Chapter 17
Compilation
Abstractions are processed like programs in that the transformed abstraction body is wrapped in a let that binds each mutated identifier to a cell. This preserves the call-by-value semantics of FLARE, since an assignment to the formal parameter of an abstraction is transformed to a cell assignment that modifies the content of a cell that is allocated locally within the abstraction. The transformation can be modified to instead implement a call-by-reference semantics (see page 436), in which a formal parameter assignment is transformed to an assignment of a cell passed into the abstraction from the point of application (Exercise 17.5). In processing let and letrec, maybe-cell is used to wrap the binding expressions for mutated identifiers in applications of the cell primitive. These two forms are processed similarly except for scoping differences in their declared names. Figure 17.13 shows our revmap example after the assignment-conversion stage. The only variable assigned in the input program is ans, and this is converted to a cell. Intuitively, consistently converting a mutated variable along with its references and assignments into explicit cell operations should not change the observable behavior of a program. So we expect that assignment conversion should preserve both the type safety and the meaning of a program. However, formally proving such intuitions can be challenging. See [WS97] for a proof that a version of assignment conversion for Scheme is a meaning-preserving transformation. Exercise 17.3 Show the result of assignment-converting the following programs using AC prog : (flarek (a b c) (let ((_ (set! a (@+ a c)))) (abs (a d) (let ((_ (set! c (@* a b)))) (set! d (@+ c d)))))) (flarek (x) (letrec ((f (abs (y) (@pair y (g (@- y 1))))) (g (abs (z) (let ((_ (set! g (abs (w) w)))) (f z))))) (f x)))
Exercise 17.4 Can assignment conversion be performed before globalization? Explain. Exercise 17.5 Suppose that FLARE/V had a call-by-reference semantics rather than a call-by-value semantics for mutable variables (see Section 8.4). Modify the definition of assignment conversion so that it implements call-by-reference semantics. (Compare to Exercise 8.22 on page 439.)
17.6 Transformation 4: Type/Effect Reconstruction
1025
(flare (a b) (let ((revmap (abs (f elts) (let ((ans (@cell (@null)))) (letrec ((loop (abs (xs) (if (@null? xs) (@^ ans) (let ((_ (@:= ans (@cons (f (@car xs)) (@^ ans))))) (loop (@cdr xs))))))) (loop elts)))))) (revmap (abs (x) (@> x b)) (@cons a (@cons (@* a 7) (@null)))))) Figure 17.13
revmap program after assignment conversion.
Exercise 17.6 A straightforward implementation of the AC prog and AC exp functions in Figure 17.12 is inefficient because (1) it traverses the AST of every declaration node at least twice: once to determine the free mutated identifiers, and once to transform the node; and (2) it may recalculate the free mutated identifiers for the same expression many times. Describe how to modify the assignment-conversion algorithm so that it works in a single traversal of the program AST and calculates the free mutated identifiers only once at every node. Note: You may need to modify the information stored in the nodes of a FLARE/V AST.
17.6
Transformation 4: Type/Effect Reconstruction
The fourth stage of the Tortoise compiler is type and effect reconstruction. Only well-typed FLARE programs are allowed to proceed through the rest of the compiler. The details of how types and effects are reconstructed were described earlier, in Section 16.2.3. Note that assignment conversion must precede this stage because type and effect reconstruction was defined for the FLARE language, which does not include set!. Preconditions: The input to type/effect reconstruction is a well-formed, closed kernel FLARE program that is assignment-free. Postconditions: The output of type/effect reconstruction is a valid, closed kernel FLARE program that is assignment-free. We will use the term valid to describe a program or expression that is well formed and is guaranteed not to encounter a dynamic type error.
1026
17.6.1
Chapter 17
Compilation
Propagating Type and Effect Information
Although neither FLARE nor FIL (the intermediate language to be used from the next compilation stage on) has explicit types or effects, this does not mean that the type and effect information generated by the FLARE type/effect reconstruction phase is thrown away. This information can be passed through the compiler stages via a separate channel, where it is appropriately transformed by each pass. In an actual implementation, this information might be stored in abstract syntax tree nodes for FLARE and FIL expressions, in symbol tables mapping variable names to their types, or in explicit type/effect derivation trees. We assume that this type and effect information is available for later stages, where it can be used to guide the compilation process. Similarly, the results from other static analyses, such as flow information [NNH98, DWM+ 01], could be computed at this stage and passed along to other compiler stages. An alternative approach used in many modern research compilers is to use so-called typed intermediate languages (TILs) that carry explicit type information (possibly including effect, flow, and other analysis information) through all stages of the compiler. In these systems, program transformations effectively transform type derivations of programs. The fact that each program manipulated by a TIL-based compiler is well typed has several advantages. The compiler can avoid generating code to check for run-time type errors, because these are provably impossible. The explicit type information carried by a TIL can be inspected to guide compilation (e.g., determining clever representations for certain types) and to implement run-time operations (such as tag-free garbage collection and checking safety properties of dynamically linked code). It also serves as an important tool for debugging a compiler implementation: if the output of a transformation doesn’t type-check, the transformation must have a bug. The reason that we do not use TILs in our presentation is to keep our compiler simple. TILs typically require a sophisticated type system with universal and existential types. Specifying each compiler stage becomes more complicated because it transforms not only expressions but their types. The explicit type information is often larger than the code it describes, which makes it impractical to show the result of compilation of even relatively simple expressions. See [MWCG99] for a presentation of using TILs to translate a high-level language all the way down into a typed assembly language.
17.6.2
Effect-based Code Optimization
The effect information reconstructed for a FLARE program is important for enabling many standard code optimizations performed by a compiler. We now discuss some of these in the context of the Tortoise compiler.
17.6.2 Effect-based Code Optimization
1027
Many program transformations require knowledge about expression interference (see Section 8.3.6). In our system, two expressions interfere if they both write to the same region or if one has a read effect on a region the other has an init or write effect on. A pure expression does not interfere with any other expression because it does not depend on the store in any way. For example, if two expressions interfere, it is unsafe to reorder them relative to each other, since this could change the order of operations on the store locations manipulated by both expressions. But if two expressions do not interfere, then it may be possible to reorder them, execute them in parallel, or perform other improvements. As a simple example of how effects can enable code optimizations, we demonstrate how the following FLARE abstraction can be improved if certain effect information is known. (abs (n) (letrec ((loop (abs (i) (if (@= i 0) (@^ x) (begin (h (f i) (g i)) (@:= x (k (g (f i)))) (@:= x (h (g i) (k n))) (loop (@- i 1))))))) (loop n)))
Assume that this abstraction appears in a scope where x is a cell in region rx and f, g, h, and k are procedures with the following latent effects: Procedure f g h k
Latent Effect (read rx) (maxeff (read ry) (write ry)) (maxeff (read rz) (write rz)) pure
Since the latent effects of f and g do not overlap, (f i) and (g i) do not interfere, and may be executed in parallel. This means that in a computer with multiple processing units, the expressions can be executed at the same time on different processing units. This is an improvement because it allows (f i) and (g i) to be executed in the maximum of the execution times for the two expressions rather than the sum of their times.3 If FLARE is extended with a letpar binding construct whose binding definition expressions are executed in parallel, then the begin expression in our example can be transformed to: 3 In practice, there is often an additional overhead associated with parallel execution; the individual execution times must be big enough to justify this overhead.
1028 (letpar ((a (f i)) (begin (h a b) (@:= x (k (@:= x (h (loop (@-
Chapter 17
Compilation
(b (g i))) (g (f i)))) (g i) (k n))) i 1))))
Extending FLARE with mutable arrays, each of which has an associated region, would further expand opportunities for parallelism. For example, given two arrays in distinct regions, loops to sum their elements could be executed in parallel. If an expression occurs more than once and it does not interfere with itself or any intervening expressions, then the result of the first occurrence can be named and the name can be used for the subsequent occurrences. This is known as common subexpression elimination. For example, the only effect of (f i) is (read rx), so it does not interfere with the invocations of f, g, and h (none of which has a (write rx) effect) that appear before the second occurrence. Since the first occurrence of (f i) already has the name a, the second occurrence of (f i) can be replaced by a: (letpar ((a (f i)) (begin (h a b) (@:= x (k (@:= x (h (loop (@-
(b (g i))) (g a))) {(f i) replaced by a} (g i) (k n))) i 1))))
Although (g i) also appears twice, its second occurrence cannot be eliminated because it interferes with the first occurrence as well as with (g a). Because g both reads and writes region ry, the second (g i) may have a different value than the first one. When an expression does not contribute to a program in its value or its effect, it may be removed via a process known as dead code elimination. For example, the second assignment expression, (@:= x (h (g i) (k n))), does not read rx before writing it, so the first assignment to x, in (@:= x (k (g a))), is unnecessary. This leaves (k (g a)), which cannot be entirely eliminated because (g a) writes to region ry, which is read later by (g i). But the invocation of k can be eliminated because it is pure and its result is not used: (letpar ((a (f i)) (b (g i))) (begin (h a b) (g a) {assignment to x and call to k eliminated} (@:= x (h (g i) (k n))) (loop (@- i 1))))
17.6.2 Effect-based Code Optimization
1029
It might seem unlikely that a programmer would ever write dead code, but it occurs in practice for a variety of reasons. For example, the assumptions in place when the code is originally written may no longer hold when the code is later modified. In our example, perhaps g and/or h initially had a latent (read rx) effect justifying the first assignment to x, but the procedures were later changed to remove this effect, and the programmer neglected to remove the first assignment to x. Perhaps the dead code was not written by a human but was created by an automatic program generator or was the result of transforming another program. Generators and transformers can be simpler to build when they are allowed to produce code that contains inefficiencies (such as common subexpressions and dead code) that are cleaned up by later optimization phases. When an expression in the body of a procedure or loop is guaranteed to have the same value for every invocation of the procedure or loop, it may be lifted out of the body via a transformation called code hoisting. In our example, since k is a pure procedure and n is an immutable variable defined outside the loop procedure, the invocation (k n) in the body of loop always has the same value. We can hoist it outside the definition of loop so that it is calculated only once rather than for every invocation of loop: (abs (n) (let ((c (k n))) {(k n) has been hoisted outside loop} (letrec ((loop (abs (i) (if (@= i 0) (@^ x) (letpar ((a (f i)) (b (g i))) (begin (h a b) (g a) (@:= x (h (g i) c)) {c replaces (k n)} (loop (@- i 1)))))))) (loop n))))
Note that if the k in (k n) were replaced by f or g, the expression could not be hoisted. The loop body writes to regions (rx and ry) that are read by these procedures, so (f n) and (g n) are not guaranteed to be loop-invariant. In each of the optimizations we have mentioned, effect information is critical for justifying the optimization. Without any effect information, we would need to conservatively assume that all invocations of f, g, h, and k are impure and interfere with each other and with the assignments to x. With these conservative assumptions, none of the optimizations we performed on our example would be permissible!
1030
17.7
Chapter 17
Compilation
Transformation 5: Translation
In this transformation, a kernel FLARE program is translated into the FIL intermediate language. All subsequent transformations are performed on FIL programs. We first present the FIL language and then describe how to transform FLARE to FIL.
17.7.1
The Compiler Intermediate Language: FIL
The intermediate language of the main stages of our transformation-based compiler uses a language that we call FIL, for Functional Intermediate Language. Like FLARE, FIL is a stateful, call-by-value, statically scoped, function-oriented language. However, FIL is simpler than FLARE in two important ways: 1. FIL supports fewer features than FLARE. It does not have a recursion construct (letrec) or an assignment construct (set!), and it represents both cells and pairs with a single form of mutable product. So specifying FIL transformations requires fewer cases than FLARE transformations. 2. Unlike FLARE, FIL does not have a formal type system and does not support type reconstruction. Although all of the remaining transformations can be expressed in a typed framework, the type systems and transformations are rather complex to describe. Specifying these transformations in FIL is much simpler. However, we will not completely disregard type and effect information. As discussed later (page 1035), we will assume that certain type and effect information is preserved by FIL programs, but will not formally describe how this is accomplished. The Syntax of FIL The syntax of FIL is specified in Figure 17.14. FIL is similar to many of the stateful variants of FL that we have studied. Some notable features of FIL are: • As in FLARE, multiargument abstractions and applications are hardwired into the kernel rather than being treated as syntactic sugar, and the abstraction keyword is abs. Unlike in FLARE, FIL applications have an explicit app keyword. • As in FLARE, multibinding let expressions are considered kernel expressions rather than sugar for applications of explicit abstractions.
17.7.1 The Compiler Intermediate Language: FIL
1031
Kernel Grammar ∗ ) Ebody ) P ∈ ProgFIL ::= (fil (Iformal E ∈ ExpFIL ::= | | |
L | I | (error Ymessage ) ∗ ) (if Etest Ethen Eelse ) | (prim Oprimop Earg ∗ ∗ (abs (Iformal ) Ebody ) | (app Erator Erand ) (let ((Iname Edefn )∗ ) Ebody )
L ∈ LitFIL ::= #u | B | N | (sym Y ) B ∈ BoolLit = {#t, #f} as in FLARE/V. N ∈ IntLit = as in FLARE/V. J ∈ PosLit = {1, 2, 3, . . .} Y ∈ SymLit = as in FLARE/V. O ∈ PrimopFIL ::= | | | | |
+ | - | * | / | % | >= bool=? | sym=? not | and | or cons | car | cdr | null | null? mprod | (mget J ) | (mset! J ) | mprod=? . . . other primitives will be added as needed . . .
; ; ; ; ;
arithmetic ops relational ops logical ops list ops mut. prod. ops
KeywordFIL = {abs, app, error, fil, if, let, let*, prim, sym} I ∈ IdentFIL = SymLit − ({Y | Y begins with @} ∪ KeywordFIL ) Syntactic Sugar (@mget J Emprod ) ds (prim (mget J ) Emprod ) (@mset! J Emprod Enew ) ds (prim (mset! J ) Emprod Enew ) (@Oop E ni=1 ) ds (prim Oop E ni=1 ), where Oop ∈{(mget J ), (mset! J )} (let* () Ebody ) ds Ebody (let* ((I1 E1 ) (Irest Erest )∗ ) Ebody ) ds (let ((I1 E1 )) (let* ((Irest Erest )∗ ) Ebody )) Figure 17.14
Syntax of FIL, the Tortoise compiler intermediate language.
• Unlike FLARE/V, FIL does not have mutable variables (i.e., no set!). But FIL does have mutable products (also known as mutable tuples), which are created via mprod, whose component slots are accessed via mget and changed via mset!, and which are tested for equality (i.e., same location in the store) via mprod=?. We treat mget and mset! as “indexed primitives” (mget Jindex ) and (mset! Jindex ) in which the primitive operator includes the index Jindex of the manipulated component slot. If we wrote (prim mget Eindex Emp ), this would imply that the index could be calculated by an arbitrary expression Eindex when in fact it must be a positive integer literal Jindex . So we instead
1032
Chapter 17
Compilation
write (prim (mget Jindex ) Emp ) (and similarly for mset!). Treating mget and mset! as primitives rather than as kernel constructs simplifies the definition of several transformations. • Unlike FLARE, FIL does not include cells and pairs; both are implemented as mutable products.4 • Unlike FLARE, FIL does not have any explicit kernel expression form (such as letrec) for recursive definitions. It is assumed that the “knot-tying” of recursion is instead performed by setting the components of mutable products. This is the approach taken in the translation from FLARE to FIL. • Other data include integers, booleans, symbols, and immutable lists, all of which are in FLARE. • Unlike FLARE, FIL does not support globally bound standard identifiers for procedures like +, x b)) (@cons a (@cons (@* a 7) (@null)))))) Figure 17.17
revmap program after translation.
((abs (x) (x w)) (abs (x) (let ((x (* x 2))) (+ x 1)))) ((abs (x) (x w)) (abs (y) (let ((z (* y 2))) (+ z 1))))
Some of the subsequent program transformations we will study require that programs are uniquely named to avoid problems with variable capture or otherwise simplify the transformation. Here we describe a renaming transformation whose output program is a uniquely named version of the input program. The renaming transformation is presented in Figure 17.18. In this transformation, every bound identifier in the program is replaced by a fresh identifier. Fresh names are introduced in all declaration constructs: the fil program construct and abs and let expressions. Renaming environments in the domain RenEnv are used to associate these fresh names with the original names and communicate the renamings to all variable references. Renaming is a purely structural transformation for all other nodes. As in many other transformations, we gloss over the mechanism for generating fresh identifiers. This mechanism can be formally specified and implemented by threading some sort of name-generation state through the transformation. For example, this state could be a natural number that is initially 0 and is incremented every time a fresh name is generated. The fresh name can combine the original name and the number in some fashion. In our examples, we assume that renamed
1040
Chapter 17
Compilation
Renaming Environments re ∈ RenEnv = Ident → Ident rbind : Ident → Ident → RenEnv → RenEnv = λIold Inew re . λIkey . if Ikey = Iold then Inew else (re Ikey ) end (rbind Iold Inew re) is abbreviated as [Iold →Inew ]re; this notation associates to the right. I.e., [I1 →I1 ][I2 →I2 ]re = [I1 →I1 ]([I2 → I2 ]re) Renaming Transformation Rprog : ProgFIL → ProgFIL Preconditions: The input to Rprog is a valid kernel FIL program. Postconditions: The output of Rprog is a valid and uniquely named kernel FIL program. n ) Ebody )]] Rprog [[(fil (I i=1 n = (fil (I i=1 ) (Rexp [[Ebody ]] ([I1 → I1 ] . . . [In → In ] (λI . I )))), where I ni=1 are fresh.
Rexp : ExpFIL → RenEnv → ExpFIL Rexp [[I ]] re = (re I ) n Rexp [[(abs (I i=1 ) Ebody )]] re n = (abs (I i=1 ) (Rexp [[Ebody ]] ([I1 → I1 ] . . . [In → In ]re))), where I ni=1 are fresh.
Rexp [[(let ((Ii Ei )ni=1 ) Ebody )]] re = (let ((Ii (Rexp [[Ei ]] re))ni=1 ) (Rexp [[Ebody ]] ([I1 → I1 ] . . . [In → In ]re))), where I ni=1 are fresh. Rexp [[E ]] re = mapsubFIL [[E ]] (λEsub . Rexp [[Esub ]] re), otherwise. Figure 17.18
Renaming transformation.
identifiers have the form prefix.number, where prefix is the original identifier, number is the current name-generator state value, and “.” is a special character that may appear in compiler-generated names but not in user-specified names.5 Later compiler stages may rename generated names from previous stages; we assume that only the prefix of the old generated name is used as the prefix for the new generated name. For example, x can be renamed to x.17, and x.17 can be renamed to x.42 (not x.17.42). Figure 17.19 shows our running example after the renaming stage. Exercise 17.11 What changes need to be made to Rexp to handle the FILsum language (see Exercise 17.10)? 5 prefix is not really necessary, since number itself is unique. But maintaining the original names helps human readers track variables through the compiler transformations.
17.8 Transformation 6: Renaming
1041
(fil (a.0 b.1) (let ((revmap.2 (abs (f.3 elts.4) (let* ((ans.5 (@mprod (@null))) (loop.6 (@mprod #u)) (_ (@mset! 1 loop.6 (abs (xs.7) (if (@null? xs.7) (@mget 1 ans.5) (let ((_ (@mset! 1 ans.5 (@cons (app f.3 (@car xs.7)) (@mget 1 ans.5))))) (app (@mget 1 loop.6) (@cdr xs.7)))))))) (app (@mget 1 loop.6) elts.4))))) (app revmap.2 (abs (x.8) (@> x.8 b.1)) (@cons a.0 (@cons (@* a.0 7) (@null)))))) Figure 17.19
revmap program after renaming.
Exercise 17.12 This exercise explores ways to formalize the generation of fresh names in the renaming transformation. Assume that rename is a function that renames variables according to the conventions described above. E.g., (rename x 17) = x.17 and (rename x.17 42) = x.42. a. Suppose that the signature of Rexp is changed to accept and return a natural number that represents the state of the fresh name generator: Rexp : ExpFIL → RenEnv → Nat → (ExpFIL × Nat)
Give modified definitions of Rprog and Rexp in which rename is used to generate all fresh names uniquely. Define any auxiliary functions you find helpful. b. An alternative way to thread the name-generation state through the renaming transformation is to use continuations. Suppose the signature of Rexp is changed as follows: Rexp : ExpFIL → RenEnv → RenameCont → Nat → ExpFIL
RenameCont is a renaming continuation defined as follows: rc ∈ RenameCont = ExpFIL → Nat → ExpFIL
Give modified definitions of Rprog and Rexp in which rename is used to generate all fresh names uniquely. Define any auxiliary functions you find helpful. c. The mapsubFIL function cannot be used in the above two parts because it does not thread the name-generation state through the processing of subexpressions. Develop modified versions of mapsubFIL that would handle the purely structural cases in the above parts.
1042
17.9
Chapter 17
Compilation
Transformation 7: CPS Conversion
Did he ever return, no he never returned And his fate is still unlearned — Bess Hawes and Jacqueline Steiner, “Charley on the MTA” In Chapter 9, we saw that continuations are a powerful mathematical tool for modeling sophisticated control features like nonlocal exits, unrestricted jumps, coroutines, backtracking, and exceptions. Section 9.2 showed how such features can be simulated in any language supporting first-class procedures. The key idea in these simulations is to represent a possible future of the current computation as an explicit procedure, called a continuation. The continuation takes as its single parameter the value of the current computation. When invoked, the continuation proceeds with the rest of the computation. In these simulations, procedures no longer return to their caller when invoked. Rather, they are transformed so that they take one or more explicit continuations as arguments and invoke one of these continuations on their result instead of returning the result. A program in which every procedure invokes an explicit continuation parameter in place of returning is said to be written in continuation-passing style (CPS). As an example of CPS, consider the FIL expression Esos in Figure 17.20. It defines a squaring procedure sqr and a sum-of-squares procedure sos and applies the latter to 3 and 4. E cps sos is the result of transforming Esos into CPS form. In E cps sos , each of the two procedures sqr and sos has been extended with a continuation parameter, which by our convention will come last in the parameter list and begin with the letter k. The sqrcps procedure invokes its continuation ksqr on the square of its input. The soscps procedure first calls sqrcps on a with a continuation that names the result asqr. This continuation then calls sqrcps on b with a second continuation that names the second result bsqr. Finally, soscps invokes its continuation ksos on the sum of these two results. The initial call (sos 3 4) must also be converted. We assume that klet* names a continuation that proceeds with the rest of the computation given the value of the let* expression. The process of transforming a program into CPS form is called CPS conversion. Here we shall study CPS conversion as a stage in the Tortoise compiler. Whereas globalization makes explicit the meaning of standard identifiers, and assignment conversion makes explicit the implicit cells of mutable variables, CPS conversion makes explicit all control flow in a program. In addition to transforming every procedure to use an explicit continuation, our compiler’s CPS transformation also makes explicit the order in which primitive operations are executed.
17.9 Transformation 7: CPS Conversion
1043
Esos = (let* ((sqr (abs (x) (@* x x))) (sos (abs (a b) (@+ (app sqr a) (app sqr b))))) (app sos 3 4)) E cps sos = (let* ((sqrcps (abs (x ksqr) (app ksqr (@* x x)))) (soscps (abs (a b ksos) (app sqrcps a (abs (asqr) (app sqrcps b (abs (bsqr) (app ksos (@+ asqr bsqr))))))))) (app soscps 3 4 klet*)) Figure 17.20
E cps sos is a CPS version of Esos .
Performing CPS conversion as a compiler stage has several benefits: • Procedure-calling mechanism: A compiler must implement the mechanism for calling a procedure, which specifies: how arguments and control are passed from the caller to the procedure when it is called; how the procedure’s result value and control are passed from the procedure back to the caller when the procedure returns; and how values needed by the caller after the procedure call are preserved during the procedure’s execution. Continuations are an explicit representation of the stack of procedure-call invocation frames used in traditional compilers to implement the call/return mechanism of procedures. In CPS-converted code, a continuation (such as (abs (asqr) . . .) above) corresponds to a pair of (1) an invocation frame that saves variables needed after the call (i.e., the free variables of the continuation, which are b and ksos in the case of (abs (asqr) . . . )) and (2) a return address (i.e., a specification of the code to be executed after the call). Since CPS procedures never return, every procedure call in a CPS-converted program can be viewed as an assembly code jump that passes arguments. In particular, invoking a continuation corresponds in assembly code to jumping to a return address with a return value in a distinguished return register. • Code linearization: CPS conversion makes explicit the order in which subexpressions are evaluated, yielding code that linearizes basic computation steps in a way similar to assembly code. For example, the body of soscps makes it clear that the square of a is calculated before the square of b. We shall see that our CPS transformation also linearizes nested primitive applications. For instance, CPS-converting the expression (@* (@+ c d) (@- c 1)) yields
1044
Chapter 17
Compilation
code in which it is clear that the addition is performed first, followed by the subtraction, and then the multiplication. • Sophisticated control features: Representing control explicitly in the form of continuations facilitates the implementation of advanced control features (such as nonlocal exits, exceptions, and backtracking) that can be challenging to implement in traditional stack-based approaches. • Uniformity: Representing control features via procedures keeps intermediate program representations simple and flexible. Moreover, any optimizations that improve procedures will work on continuations as well. But this uniformity also has a drawback: because of the liberal use of procedures, the efficiency of procedure calls in CPS code is of the utmost importance, making certain optimizations almost mandatory. We present the Tortoise CPS transformation in four stages. The structure of the CPS code produced by the CPS transformation is formalized in Section 17.9.1. A straightforward approach to CPS conversion that is easy to understand but leads to intolerable inefficiencies in the converted code is described in Section 17.9.2. Section 17.9.3 presents a more complex but considerably more efficient CPS transformation that is used in Tortoise. Finally, we consider the CPS conversion of advanced control constructs in Section 17.9.4.
17.9.1
The Structure of Tortoise CPS Code
All procedure applications can be classified according to their relationship to the innermost enclosing procedure declaration (or program). A procedure application is a tail call if its implicit continuation is the same as that of its enclosing procedure. In other words, no computational work is done between the termination of the inner tail call and the termination of its enclosing procedure. These two events can be viewed as happening simultaneously. All other procedure applications are nontail calls. These are characterized by pending computations that must take place between the termination of the nontail call and the termination of a call to its enclosing procedure. The notion of a tail call is important in CPS conversion because every procedure call in CPS code must be a tail call. Otherwise, it would have to return to perform a pending computation. As concrete examples of tail versus nontail calls, consider the FIL abstractions in Figure 17.21. • In Eabs1 , the call to g is a tail call because a call to Eabs1 returns a value v when g returns v . But both calls to f are nontail calls because the results of these calls must be passed to g before Eabs1 returns.
17.9.1 The Structure of Tortoise CPS Code
1045
Eabs1 = (abs (f g x) (app g (app f x) (app f (@+ x 1)))) Eabs2 = (abs (p q r s y) (let ((a (app p (app q y)))) (app r a (app s a)))) Eabs3 = (abs (filter pred base zs) (if (@null? zs) (app base zs) (if (app pred (@car zs)) (@cons (@car zs) (app filter pred base (@cdr zs))) (app filter pred base (@cdr zs))))) Figure 17.21
Sample abstractions for understanding tail versus nontail calls.
• In Eabs2 , only the call to r is a tail call. The results of the calls to p, q, and s must be further processed before Eabs2 returns. • In Eabs3 , there are two tail calls: the call to base, and the second call to filter. The result of the first call to filter must be processed by @cons before Eabs3 returns, so this is a nontail call. The result of pred must be checked by the if, so this is a nontail call as well. In this example, we see that (1) a procedure body may have multiple tail calls and (2) the same procedure can be invoked in both tail calls and nontail calls within the same expression. Tail and nontail calls can be characterized syntactically. The FIL expression contexts in which tail calls can appear are defined by TC in the following grammar: TC ∈ TailCallContext ::= | | | |
2 (if Etest TC E ) (if Etest E TC) (let ((I E )∗ ) TC) (abs (I ∗ ) TC)
In FIL, an application expression Eapp = (app E E ∗ ) is a tail call if and only if the enclosing program can be expressed in the form (fil (I ∗ ) TC{Eapp }) — i.e., as the result of filling a tail context in the program body with Eapp . Any application that does not appear in a tail context is a nontail call. In particular, applications occurring in (but not wrapped by abs in) if tests, let definition expressions, and app and prim arguments are nontail calls.
1046
Chapter 17
Compilation
∗ P ∈ Progcps ::= (fil (Iformal ) Ebody ) ∗ ) | (if Vtest Ethen Eelse ) E ∈ Expcps ::= (app Irator Vrand | (let ((Iname LE defn )) Ebody ) | (error Ymessage )
V ∈ ValueExpcps ::= L | I ∗ ∗ ) Ebody ) | (prim Oprimop Varg ) LE ∈ LetableExpcps ::= L | (abs (Iformal
L ∈ Lit = as in full FIL Y ∈ SymLit = as in full FIL O ∈ Primop = as in full FIL I ∈ Ident = as in full FIL Figure 17.22 Kernel grammar for FILcps , the subset of FIL in CPS form. The result of CPS conversion is a FILcps program.
Understanding tail calls is essential for studying the structure of Tortoise CPS code, which is defined by the grammar for FILcps , a restricted dialect of FIL presented in Figure 17.22. The FILcps grammar requires specialized component expressions for many constructs that can have arbitrary component expressions in FIL: the rator of an app must be an identifier; the rands of an app, arguments of a prim, and test of an if must be literals or identifiers; and the definition expression of a let must be a literal, abstraction, or primitive application. As explained below, these restrictions guarantee that all FILcps procedure calls are tail calls, that all procedure calls and primitive applications are linearized, and that FILcps code resembles assembly code in many ways: • The definition of Expcps in FILcps guarantees that app expressions appear precisely in the tail contexts TC discussed above. So every call in a FILcps program is guaranteed to be a tail call. In a continuation-based denotational semantics (see Section 9.3) of a FILcps program, the expression continuation k for every app expression is exactly the same: the top-level continuation of the program. We say that procedure calls in a CPS program “never return” because no procedure call establishes a new control point to which a value can be returned. This explains why calls in a CPS program can be viewed as assembly-language jumps (that happen to additionally pass arguments). • Operands of app and prim must be literals or variables, so one application (of a procedure or a primitive) may not be nested within another. The test subexpression of an if must also be a literal or variable. The definition subexpression of a let can only be one of a restricted number of simple “letable
17.9.1 The Structure of Tortoise CPS Code
1047
expressions” that does not include apps, ifs, or other lets. These restrictions impose the straight-line nature of assembly code on the bodies of FIL abstractions and programs, which must be elements of Expcps . The only violation of the straight-line property is the if expression, which has an element of Expcps for each branch. This branching code would need to be linearized elsewhere in order to generate assembly language (see Exercise 17.16 on page 1056). • The grammar effectively requires specifying the order of evaluation of primitive applications by forcing the result of every primitive application to be named by a let. So the CPS transformation of an expression containing nested primitive applications uses a sequence of nested single-binding let expressions to introduce names for the intermediate results returned by the primitives. For example, CPS-converting the expression (@+ (@- 0 (@* b b)) (@* 4 (@* a c)))
in the context of an initial continuation ktop.0 yields:6 (let* ((t.3 (@* b b)) (t.2 (@- 0 t.3)) (t.5 (@* a c)) (t.4 (@* 4 t.5)) (t.1 (@+ t.2 t.4))) (app ktop.0 t.1)))
The let-bound names represent abstract registers in assembly code. Mapping these abstract registers to the actual registers of a real machine (a process known as register allocation — see Section 17.12) must be performed by a later compilation stage. • The operator of an app must be an identifier. In classical CPS conversion, the operator of an app may be an abstraction as well. However, we require that all abstractions be named in a let binding so that certain properties of the FILcps structure are preserved by later Tortoise transformations. In particular, the subsequent closure-conversion stage will transform abstractions into applications of the mprod primitive. Such applications cannot appear in the context of CPS values V , but can appear in “letable expressions” LE . • Every execution path through an abstraction or program body Ebody must end in either an app or an error. The Expcps grammar does not include literals, identifiers, and abstractions, because these would allow procedures and 6
The particular let-bound names used are irrelevant. Here and below, we show the results of CPS conversion using our implementation of the transformation described in Section 17.9.3.
1048
Chapter 17
Compilation
(let* ((sqrFILcps (abs (x ksqr) (let ((t1 (@* x x))) (app ksqr t1)))) (sosFILcps (abs (a b ksos) (let ((k1 (abs (asqr) (let ((k2 (abs (bsqr) (let ((t2 (@+ asqr bsqr))) (app ksos t2))))) (app sqrFILcps b k2))))) (app sqrFILcps a k1))))) (app sosFILcps 3 4 klet*)) Figure 17.23
A CPS version of Esos expressed in FILcps .
programs to return values. But FILcps procedures and programs never return, so the last action in procedure or program body must be to call a procedure or signal an error. Moreover, apps and errors can appear only as the final expressions executed in such bodies — they cannot appear in let definitions, procedure or primitive operands, or if tests. Modulo the branching allowed by if, program and abstraction bodies in FILcps are similar in structure to basic blocks in traditional compilers. A basic block is a sequence of statements such that the only control transfers into the block are at the beginning and the only control transfers out of the block are at the end. The fact that ValueExpcps does not include abstractions or primitive applications means that E cps sos in Figure 17.20 is not a legal FILcps expression. A To satFILcps version of the Esos expression is presented in Figure 17.23. isfy the syntactic constraints of FILcps , let-bound names must be introduced to name abstractions (the continuations k1 and k2) and the results of primitive applications (t1 and t2). Note that some calls (to sqrFILcps and sosFILcps ) are to transformed versions of procedures in the original Esos expression. These correspond to the jump-to-subroutine idiom in assembly code. The other calls (to ksqr and ksos) are to continuation procedures introduced by CPS conversion. These model the return-from-subroutine idiom in assembly code. We will assume that the grammar for FILcps in Figure 17.22 describes the structure of CPS code after the standard FIL simplifications in Figure 17.15 have been performed. The CPS conversion functions we study below sometimes generate expressions that are illegal according to the FILcps grammar before such simplifications are performed. However, in all these cases, simpli-
17.9.2 A Simple CPS Transformation
1049
fication yields a legal FILcps expression. For example, CPS conversion might generate (let ((a.2 b.1)) (app k.0 a.2)), which is not a FILcps expression because the variable reference b.1 is not an element of the domain LetableExpcps . However, applying the [copy-prop] simplification to this expression yields (app k.0 b.1), which is indeed a FILcps expression. The next two sections present two different CPS transforms, each of which converts every procedure call in the program into a tail call: Preconditions: The input to CPS conversion is a valid, uniquely named kernel FIL program. Postconditions: The output of CPS conversion is a valid, uniquely named kernel FILcps program.
17.9.2
A Simple CPS Transformation
The first transformation we will examine, SCPS (for Simple CPS conversion), is easier to explain, but generates code that is much less efficient than that produced by the second transformation. The SCPS transformation is defined in Figure 17.24. SCPS exp transforms any given expression E to an abstraction (abs (Ik ) E ) that expects as its argument Ik an explicit continuation for E and eventually calls this continuation on the value of E within E . This explicit continuation is immediately invoked to pass along (or “return”) the values of literals, identifiers, and abstractions. Each abstraction is transformed to take as a new additional final parameter a continuation Ikcall that is passed as the explicit continuation to its transformed body. Because the grammar of FILcps does not allow abstractions to appear directly as app arguments, it is also necessary to name the transformed abstraction in a let using a fresh identifier Iabs . In the transformation of an app expression (app E0 E1 . . . En ), explicit continuations specify that the rator E0 and rands E1 . . . En are evaluated in left-to-right order before the invocation takes place. The fresh variables I0 . . . In are introduced to name the values of the subexpressions. Since every procedure has been transformed to expect an explicit continuation as its final argument, the transformed app must supply its continuation Ik as the final rand. The let transformation is similar, except that the let-bound names are used in place of fresh names for naming the values of the definition expressions. The unique naming requirement on input programs to SCPS guarantees that no variable capture can take place in the let transformation (see Exercise 17.15).
1050
Chapter 17
Compilation
The transformation of prim expressions is similar to that for app and let. The syntactic constraints of FILcps require that a fresh variable (here named Ians ) be introduced to name the result of a prim expression before passing it to the continuation. In a transformed if expression, a fresh name Itest names the result of the test expression and the same continuation Ik is supplied to both transformed branches. This is the only place in SCPS where the explicit continuation Ik is referenced more than once in the transformed expression. The transformed error construct is the only place where the continuation is never referenced. All other constructs use Ik in a linear fashion — i.e., they reference it exactly once. This makes intuitive sense for regular control flow, which has only one possible “path” out of every expression other than if and error. Even in the if case, only one branch can be taken in a dynamic execution even though the continuation is mentioned twice. In Section 17.9.4 we will see how CPS conversion exposes the nonlinear nature of some sophisticated control features. FIL programs are converted to CPS form by SCPS prog , which adds an additional parameter Iktop that is an explicit top-level continuation for the program. It is assumed that the mechanism for program invocation will supply an appropriate procedure for this argument. For example, an operating system might construct a top-level continuation that displays the result of the program on the standard output stream or in a window within a graphical user interface. The clauses for SCPS exp contain numerous instances of the pattern (app (SCPS exp [[E1 ]]) E2 )
where E2 is an abstraction or variable reference. But SCPS exp is guaranteed to return an abs expression, and the FILcps grammar does not allow any subexpression of an app to be an abs. Doesn’t this yield an illegal FILcps expression? The result of SCPS exp would be illegal were it not for the [implicit-let] simplification, which transforms every app of the form (app (abs (Ik ) E1 ) E2 )
to the expression (let ((Ik E2 )) E1 )
Since the grammar for letable expressions LE permits definition expressions that are abstractions, the result of SCPS exp is guaranteed to be a legal FILcps expression when E2 is an abstraction. When E2 is a variable, the [copy-prop] simplification will also be performed, eliminating the let expression.
17.9.2 A Simple CPS Transformation
1051
SCPS prog : ProgFIL → Progcps n n SCPS prog [[(fil (I i=1 ) Ebody )]] = (fil (I i=1 Iktop ) ; Iktop fresh (app (SCPS exp [[Ebody ]]) Iktop ))
SCPS exp : ExpFIL → Expcps SCPS exp [[L]] = (abs (Ik ) (app Ik L)) ; Ik fresh SCPS exp [[I ]] = (abs (Ik ) (app Ik I )) ; Ik fresh n ) Ebody )]] = (abs (Ik ) ; Ik fresh SCPS exp [[(abs (I i=1 (let ((Iabs ; Iabs fresh n (abs (I i=1 Ikcall ) ; Ikcall fresh (app (SCPS exp [[Ebody ]]) Ikcall )))) (app Ik Iabs )))
SCPS exp [[(app E ni=0 )]] = (abs (Ik ) ; Ik fresh (app (SCPS exp [[E0 ]]) (abs (I0 ) ; I0 fresh .. . (app (SCPS exp [[En ]]) (abs (In ) ; In fresh (app I ni=0 Ik ))) . . . ))) SCPS exp [[(let ((Ii Ei )ni=1 ) Ebody )]] = (abs (Ik ) ; Ik fresh (app (SCPS exp [[E1 ]]) (abs (I1 ) .. . (app (SCPS exp [[En ]]) (abs (In ) (app (SCPS exp [[Ebody ]]) Ik ))) . . . ))) SCPS exp [[(prim O E ni=1 )]] = (abs (Ik ) ; Ik fresh (app (SCPS exp [[E1 ]]) (abs (I1 ) ; I1 fresh .. . (app (SCPS exp [[En ]]) (abs (In ) ; In fresh n (let ((Ians (prim O I i=1 ))) ; Ians fresh (app Ik Ians )))) . . . ))) SCPS exp [[(if Etest Ethen Eelse )]] = (abs (Ik ) ; Ik fresh (app (SCPS exp [[Etest ]]) (abs (Itest ) ; Itest fresh (if Itest (app (SCPS exp [[Ethen ]]) Ik ) (app (SCPS exp [[Eelse ]]) Ik ))))) SCPS exp [[(error Ymsg )]] = (abs (Ik ) (error Ymsg )) ; Ik fresh Figure 17.24
A simple CPS transformation.
1052
Chapter 17
Compilation
As a simple example of SCPS, consider the CPS conversion of the incrementing program Pinc = (fil (a) (@+ a 1)). Before any simplifications are performed, SCPS prog [[Pinc ]] yields (fil (a ktop.0) (app (abs (k.2) (app (abs (k.6) (app k.6 a)) (abs (t.3) (app (abs (k.5) (app k.5 1)) (abs (t.4) (let ((t.1 (@+ t.3 t.4))) (app k.2 t.1))))))) ktop.0))
Three applications of [implicit-let] simplify this code to (fil (a ktop.0) (let ((k.2 ktop.0)) (let ((k.6 (abs (t.3) (let ((k.5 (abs (t.4) (let ((t.1 (@+ t.3 t.4))) (app k.2 t.1))))) (app k.5 1))))) (app k.6 a)))) : A single [copy-prop] replaces k.2 by ktop.0 to yield the final result Pinc
(fil (a ktop.0) (let ((k.6 (abs (t.3) (let ((k.5 (abs (t.4) (let ((t.1 (@+ t.3 t.4))) (app ktop.0 t.1))))) (app k.5 1))))) (app k.6 a))) Pinc is a legal FILcps program — go ahead and check! Its convoluted nature makes it a bit tricky to read. Here is one way to read this program:
The program is given an input a and top-level continuation ktop.0. First evaluate a and pass its value to continuation k.6, which gives it the name t.3. Then evaluate 1 and pass it to continuation k.5, which gives it the name t.4. Next, calculate the sum of t.3 and t.4 and name the result t.1. Finally, return this answer as the result of the program by invoking ktop.0 on t.1. This is a lot of work to increment a number! Even though the [implicit-let] and [copy-prop] rules have simplified the program, it could still be simpler: the
17.9.2 A Simple CPS Transformation
1053
continuations k.5 and k.6 merely rename the values of a and 1 to t.3 and t.4, which is unnecessary. In larger programs, the extent of these undesirable inefficiencies becomes more apparent. For example, Figure 17.25 shows the result of using SCPS to transform a numerical program Pquad with several nested subexpressions. Try to read . Along the way you will notice the transformed program as we did with Pinc numerous unnecessary continuations and renamings. The result of performing SCPS on our revmap example is so large that it would require several pages to display. The desugared revmap program has an abstract syntax tree with 46 nodes; transforming it with SCPS prog yields a result with 314 nodes. And this is after simplification — the unsimplified transformed program has 406 nodes! Can anything be done to automatically eliminate the inefficiencies introduced by SCPS? Yes! It is possible to define additional simplification rules that will make the CPS-converted code much more reasonable. For example, in (let ((I Edefn )) Ebody ), if Edefn is a literal or abstraction, it is possible to replace the let by the substitution of Edefn for I in Ebody . This simplification is traditionally called constant propagation and (when followed by [implicit-let]) is called inlining for abstractions. For example, two applications of inlining on Pinc yield (fil (a ktop.0) (let ((t.3 a)) (let ((t.4 1)) (let ((t.1 (@+ t.3 t.4))) (app ktop.0 t.1)))))
and then copy propagation and constant propagation simplify the program to (fil (a ktop.0) (let ((t.1 (@+ a 1))) (app ktop.0 t.1))) Performing these additional simplifications on Pquad in Figure 17.25 gives the following much improved CPS code:
(fil (a b c ktop.0) (let* ((t.20 (@* b b)) (t.16 (@- 0 t.20)) (t.9 (@* a c)) (t.5 (@* 4 t.9)) (t.1 (@* t.16 t.5))) (app ktop.0 t.1)))
These examples underscore the inefficiency of the code generated by SCPS.
1054
Chapter 17
Compilation
Why don’t we just modify FIL to include the constant propagation and inlining simplifications? Constant propagation of literals is not problematic,7 but inlining is a delicate transformation. In FILcps , it is legal to copy an abstraction only to certain positions (such as the rator of an app, where it can be removed by [implicit-let]). When a named abstraction is used more than once in the body of a let, copying the abstraction multiple times makes the program bigger. Unrestricted inlining can lead to code bloat, a dramatic increase in the size of a program. In the presence of recursive procedures, special care must often be taken to avoid infinitely unwinding a recursive definition. Since we insist that FIL simplifications be straightforward to implement, we do not include inlining as a simplification. Inlining issues are further explored in Exercise 17.17. Does that mean we are stuck with an inefficient CPS transformation? No! In the next section, we study a cleverer approach to CPS conversion that avoids generating unnecessary code in the first place. Exercise 17.13 Consider the FIL program P = (fil (x y) (@* (@+ x y) (@- x y))). a. Show the result P1 generated by SCPS prog [[P ]] without performing any simplifications. b. Show the result P2 of simplifying P1 using the standard FIL simplifications (including [implicit-let] and [copy-prop]). c. Show the result P3 of further simplifying P2 using inlining in addition to the standard FIL simplifications. Exercise 17.14 a. Suppose that begin, scand, scor, and cond (from FLARE/V) were kernel FIL constructs. Give the SCPS exp clauses for these four constructs. b. Suppose that FILcps were extended to include mutable variables by adding the assignment construct (set! I E ) as an element of LE . Give the SCPS exp clause for set!. Exercise 17.15 a. Give a concrete example of how variable capture can take place in the let clause of SCPS exp if the initial program is not uniquely named. b. Modify the let clause of SCPS exp so that it works properly even if the initial program is not uniquely named. 7 Nevertheless, we do not include constant propagation in our list of standard simplifications because we don’t want constants to be copied when we get to the register-allocation stage of our compiler.
17.9.2 A Simple CPS Transformation
1055
Pquad = (fil (a b c) (@+ (@- 0 (@* b b)) (@* 4 (@* a c)))) SCPS prog [[Pquad ]] = Pquad , where Pquad = (fil (a b c ktop.0) (let* ((k.17 (abs (t.3) (let* ((k.6 (abs (t.4) (let ((t.1 (@+ t.3 t.4))) (app ktop.0 t.1)))) (k.15 (abs (t.7) (let* ((k.10 (abs (t.8) (let ((t.5 (@* t.7 t.8))) (app k.6 t.5)))) (k.14 (abs (t.11) (let ((k.13 (abs (t.12) (let ((t.9 (@* t.11 t.12))) (app k.10 t.9))))) (app k.13 c))))) (app k.14 a))))) (app k.15 4)))) (k.26 (abs (t.18) (let* ((k.21 (abs (t.19) (let ((t.16 (@- t.18 t.19))) (app k.17 t.16)))) (k.25 (abs (t.22) (let ((k.24 (abs (t.23) (let ((t.20 (@* t.22 t.23))) (app k.21 t.20))))) (app k.24 b))))) (app k.25 b))))) (app k.26 0)))
Figure 17.25
Simple CPS conversion of a numeric program.
1056
Chapter 17
Compilation
Exercise 17.16 Control branches in linear assembly language code are usually provided via branch instructions that perform a jump if a certain condition holds but “drop through” to the next instruction if the condition does not hold. We can model assemblystyle branch instructions in FILcps by restricting if expressions to the form ∗ (if Vtest (app Vrator Vrand ) Eelse )
which immediately performs a subroutine jump (via app) if the test is true and otherwise drops through to Eelse . Modify the SCPS exp clause for if so that all transformed ifs have this restricted form. Exercise 17.17 This exercise explores procedure inlining. Consider the following [copyabs] simplification rule, where AB ranges over FIL abstractions: simp (let ((I AB )) Ebody ) − − −− → [AB /I ]Ebody
[copy-abs]
Together, [copy-abs] and the standard FIL [implicit-let] and [copy-prop] rules implement a form of procedure inlining. For example (let ((inc (abs (x) (@+ x 1)))) (@* (app inc a) (app inc b)))
can be simplified via [copy-abs] to (@* (app (abs (x) (@+ x 1)) a) (app (abs (x) (@+ x 1)) b))
Two applications of [implicit-let] give (@* (let ((x a)) (@+ x 1)) (let ((x b)) (@+ x 1)))
and two applications of [copy-prop] yield the inlined code (@* (@+ a 1) (@+ b 1))
a. Use inlining to remove all calls to sqr in the following FIL expression. How many multiplications does the resulting expression contain? (let ((sqr (abs (x) (@* x x)))) (app sqr (app sqr (app sqr a))))
b. Use inlining to remove all calls to sqr, quad, and oct in the following FIL expression. How many multiplications does the resulting expression contain? (let* ((sqr (abs (x) (@* x x))) (quad (abs (y) (@* (app sqr y) (app sqr y)))) (oct (abs (z) (@* (app quad z) (app quad z))))) (@* (app oct a) (app oct b)))
c. What happens if inlining is used to simplify the following FIL expression? (let ((f (abs (g) (app g g)))) (app f f))
d. Can expressions like the one in part c ever arise in the compilation of a FLARE/V program? Explain.
17.9.2 A Simple CPS Transformation
1057
e. Using only standard FIL simplifications, the result of SCPS prog is guaranteed to be uniquely named if the input is uniquely named. This property does not hold in the presence of inlining. Write an example program Pnun such that the result of simplifying SCPS prog [[Pnun ]] via inlining is not uniquely named. Hint: Where can duplication occur in a CPS-converted program? f. Inlining multiple copies of an abstraction can lead to code bloat. Develop an example FIL program Pbloat where performing inlining on the result of SCPS prog [[Pbloat ]] yields a larger transformed program rather than a smaller one. Hint: Where can duplication occur in a CPS-converted program? Exercise 17.18 Emil P. Mentor wants to modify the CPS transformation to add a little bit of profiling information. Specifically, the modified CPS transformation should produce code that keeps a count of user procedure (not continuation) applications. Users will be able to access this information with the new construct (app-count), which is added to the grammar of kernel FILcps expressions: E ∈ Exp ::= . . . | (app-count)
Emil gives the following example (where he uses the notation x, y normally used for pair values to represent mutable products with two components): (let ((f (abs (x) (prim mprod x (app-count)))) (id (abs (y) y)) (twice (abs (g) (abs (z) (app g (app g z)))))) (prim mprod (app f (app-count)) (prim mprod (app id (app f (app id (app-count)))) (app f (app (app (app twice twice) id) (app-count)))))) − − − → 0 , 1 , 1 , 3 , 4 , 16 F IL
In the modified SCPS transformation, all procedures (including continuations) should take as an extra argument the number of user procedure applications made so far. For example, here are Emil’s new SCPS clauses for program, literals, and conditionals: SCPS prog [[(fil (I n i=1 ) Ebody )]] = (fil (I n i=1 Iktop ) ; Iktop fresh (app (SCPS exp [[Ebody ]]) 0 Iktop )) SCPS exp [[L]] = (abs (In Ik ) ; In (app count) and Ik (continuation) fresh (app Ik In L)) SCPS exp [[(if Etest Ethen Eelse )]] = (abs (In0 Ik ) ; In0 and Ik fresh (app (SCPS exp [[Etest ]]) In0 (abs (In1 Itest ) ; In1 and Itest fresh (if Itest (app (SCPS exp [[Ethen ]]) In1 Ik ) (app (SCPS exp [[Eelse ]]) In1 Ik )))))
Write the modified SCPS clauses for abs, app, let, and app-count.
1058
17.9.3
Chapter 17
Compilation
A More Efficient CPS Transformation
Reconsider the result of SCPS on the program (fil (a) (@+ a 1)): (fil (a ktop.0) (let ((k.6 (abs (t.3) (let ((k.5 (abs (t.4) (let ((t.1 (@+ t.3 t.4))) (app ktop.0 t.1))))) (app k.5 1))))) (app k.6 a)))
The inefficient code we eliminated by inlining in the last section is shown in gray. Our goal in developing a more efficient CPS transformation is to perform these simplifications as part of CPS conversion itself rather than waiting to do them later. Instead of eliminating unsightly gray code as an afterthought, we want to avoid generating it in the first place! Our approach is based on a diabolically simple shift of perspective: we view the gray code as part of the metalanguage specification of the transformation rather than as part of the FIL code being transformed. If we change the gray FIL lets, abss, and apps to metalanguage lets, λs, and applications, our example becomes: (fil (a ktop.0) let k6 be (λV3 . let k5 be (λV4 . (let ((t.1 (@+ V3 V4 ))) (app ktop.0 t.1))) in (k5 1)) in (k6 a))
To enhance readability, we will keep the metalanguage notation in gray and the FILcps code in black teletype font. Note that k5 and k6 name metalanguage functions whose parameters (V3 and V4 ) must be pieces of FILcps syntax — in particular, FILcps value expressions (i.e., literals and variable references). Indeed, k5 is applied to the FILcps literal 1 and k6 is applied to the FILcps identifier a. The result of evaluating the gray metalanguage expressions in our example yields (fil (a ktop.0) (let ((t.1 (@+ a 1))) (app ktop.0 t.1)))
which is exactly the simplified result we want! We have taken computation that would have been performed when executing the code generated by CPS conversion and instead performed it when the code
17.9.3 A More Efficient CPS Transformation
1059
is generated. The output of CPS conversion can now be viewed as code that is executed in two stages: the gray code is the code that can be executed as part of CPS conversion, while the black code is the residual code that can only be executed later at run time. This notion of staged computation is the key idea of an approach to optimization known as partial evaluation. The goal of partial evaluation is to evaluate at compile time all static expressions — i.e., those expressions that do not depend on information known only at run time — and leave behind a residual dynamic program that is executed at run time. In our case, the static expressions are the gray metalanguage code that is executed “for free” as part of CPS conversion, and the dynamic expressions are the black FILcps code. Our improved approach to CPS conversion will make heavy use of gray abstractions of the form (λV . . . . ) that map FILcps value expressions (i.e., literals and variable references) to other FILcps expressions. Because these abstractions play the role of continuations at the metalanguage level, we call them metacontinuations. In the above example, k5 and k6 are examples of metacontinuations. A metacontinuation can be viewed as a metalanguage representation of a special kind of context: a FILcps expression with named holes that can be filled only with FILcps value expressions. Such contexts may contain more than one hole, but a hole with a given name can appear only once. For example, here are metacontinuations that will arise in the CPS conversion of the incrementing program: Context Notation
Metalanguage Notation
(app ktop.0 21 )
λV1 . (app ktop.0 V1 )
(let ((t.1 (@+ 23 24 ))) (app ktop.0 t.1))
λV4 . (let ((t.1 (@+ V3 V4 ))) ; V3 is free (app ktop.0 t.1))
(let ((t.1 (@+ 23 1))) (app ktop.0 t.1))
λV3 . (let ((t.1 (@+ V3 1))) (app ktop.0 t.1))
Figures 17.26 and 17.27 present an efficient version of CPS conversion that is based on the notions of staged computation and metacontinuations. We call this transformation MCPS (for metaCPS conversion). The metavariable m ranges over metacontinuations in the domain MetaCont, which consists of functions that map FILcps value expressions to FILcps expressions. The MCPS functions in Figures 17.26 and 17.27 are similar to the SCPS functions in Figure 17.24 (page 1051). Indeed, except for the let and if clauses, the MCPS clauses can be derived automatically from the SCPS clauses by the following transformation process:
1060
Chapter 17
Compilation
Domain m ∈ MetaCont = ValueExpcps → Expcps Conversion Functions mc→exp : MetaCont → Expcps = (λm . (abs (Itemp ) (m Itemp ))) ; Itemp fresh id→mc : Ident → MetaCont = (λI . (λV . (app I V ))) MetaCPS Program Transformation MCPS prog : ProgFIL → Progcps n n MCPS prog [[(fil (I i=1 ) Ebody )]] = (fil (I i=1 Iktop ) ; Iktop fresh (MCPS exp [[Ebody ]] (id→mc Iktop )))
Figure 17.26
An efficient CPS transformation based on metacontinuations, Part 1.
• Transform every continuation-accepting FILcps abstraction (abs (Ik ) . . . ) into a metacontinuation-accepting metalanguage abstraction (λm . . . . ). • Transform every FILcps application (app Ik V ) in which Ik denotes a continuation to a metacall (i.e., metalanguage function call) of the form (m V ), where m is the metacontinuation that corresponds to Ik . This makes sense because the metacontinuation m is a metalanguage function that expects a value expression V as its argument. • Transform every FILcps application (app (SCPS exp [[E ]]) (abs (I ) . . . )) to a metacall (MCPS exp [[E ]] (λV . . . . )). This transforms every FILcps continuation of the form (abs (I ) . . . ) into a metacontinuation of the form (λV . . . . ), thus providing the metacontinuation-accepting function returned by MCPS exp [[E ]] with the metacontinuation it expects. • Transform every FILcps application (app (SCPS exp [[E ]]) Ik ) in which Ik has not already been transformed to m to a metacall (MCPS exp [[E ]] (id→mc Ik )), where id→mc converts a FILcps identifier Ik denoting an unknown continuation to a metacontinuation (λV . (app Ik V )). This conversion is necessary to provide the metacontinuation-accepting function returned by MCPS exp [[E ]] with the metacontinuation it expects. • Transform every FILcps application (app I ni=0 Ik ) in which I0 , . . . , In are the bound variables of continuations and Ik denotes the continuation bound by an SCPS exp clause to (let ((Ik (mc→exp m))) (app V ni=0 Ik )), where • Ik is a fresh name; • V0 , . . . , Vn are the bound variables of the metacontinuations that correspond to the continuations binding I0 , . . . , In ;
17.9.3 A More Efficient CPS Transformation
1061
MetaCPS Expression Transformation MCPS exp : ExpFIL → MetaCont → Expcps MCPS exp [[L]] = (λm . (m L)) MCPS exp [[I ]] = (λm . (m I )) n MCPS exp [[(abs (I i=1 ) Ebody )]] = (λm . (let ((Iabs ; Iabs fresh n Ikcall ) ; Ikcall fresh (abs (I i=1 (MCPS exp [[Ebody ]] (id→mc Ikcall ))))) (m Iabs )))
MCPS exp [[(app E ni=0 )]] = (λm . (MCPS exp [[E0 ]] (λV0 . .. . (MCPS exp [[En ]] (λVn . (let ((Ik (mc→exp m))) ; Ik fresh (app V ni=0 Ik )))) . . . ))) MCPS exp [[(let ((Ii Ei )ni=1 ) Ebody )]] = (λm . (MCPS exp [[E1 ]] (λV1 . .. . (MCPS exp [[En ]] (λVn . (let* ((Ii Vi )ni=1 ) (MCPS exp [[Ebody ]] m)))) . . . ))) MCPS exp [[(prim O E ni=1 )]] = (λm . (MCPS exp [[E1 ]] (λV1 . .. . (MCPS exp [[En ]] (λVn . (let ((Ians (prim O V ni=1 ))) ; Ians fresh (m Ians )))) . . . ))) MCPS exp [[(if Etest Ethen Eelse )]] = (λm . (MCPS exp [[Etest ]] (λVtest . (let ((Ikif (mc→exp m))) ; Ikif fresh (if Vtest (MCPS exp [[Ethen ]] (id→mc Ikif )) (MCPS exp [[Eelse ]] (id→mc Ikif ))))))) MCPS exp [[(error Ymsg )]] = (λm . (error Ymsg )) Figure 17.27
An efficient CPS transformation based on metacontinuations, Part 2.
1062
Chapter 17
Compilation
• m is the metacontinuation variable bound by the MCPS exp clause corresponding to the SCPS exp clause that binds the continuation variable Ik ; and • mc→exp is a function that converts a metacontinuation m to a FILcps continuation (abs (I ) (m I )). For example: (mc→exp (λV3 . (let ((t.1 (@+ V3 1))) (app ktop.0 t.1)))) = (abs (t.2) (let ((t.1 (@+ t.2 1))) (app ktop.0 t.1)))
In this case, there is no metacontinuation-accepting function to process the metacontinuation m, so mc→exp is necessary to convert the gray m into a black residual FILcps abstraction. The FILcps grammar forces this abstraction to be named, which is the purpose of the (let ((Ik . . . )) . . . ). The MCPS clauses for let and if are based on the above transformations, but also contain some special-purpose code. The let clause contains additional code to construct a residual let* expression binding the original let-bound identifiers. To avoid potential duplication involving the metacontinuation m, the if clause gives the name Ikif to a residual version of m and uses (id→mc Ikif ) in place of m for the two branches. The key benefit of the metacontinuation approach to CPS conversion is that many beta reductions that would be left as residual run-time code in the simple approach are performed at compile time. The MCPS functions are carefully designed so that every metacontinuation-accepting function (λm . . . . ) that arises in the conversion process is applied to a metacontinuation of the form (λVformal . M ), where M is a metalanguage expression denoting a FILcps expression. Observe that in each (λm . M ) that appears in the MCPS definition, the metacontinuation m is referenced at most once in M . If m is referenced zero times in M , then the metacall ((λm . M ) (λVformal . M )) simply reduces to M . If m is referenced once in M , then M can be written as M{m}, where M is a one-holed metalanguage expression context. By the usual metalanguage beta-reduction rule, each metacall of the form ((λm . M{m}) (λVformal . M ))
can be reduced to M{(λVformal . M )}
In the case where m is applied to a value expression within M , the metacall ((λm . M{(m Vactual )}) (λVformal . M ))
reduces to M{[Vactual /Vformal ]M }
17.9.3 A More Efficient CPS Transformation
1063
via two beta reductions. Since MCPS exp [[Vactual ]] = (λm . (m Vactual )), metacalls of the form (MCPS exp [[Vactual ]] (λVformal . M ))
are special cases of this pattern that can be reduced to [Vactual /Vformal ]M
The fact that m is referenced at most once in every function that accepts a metacontinuation guarantees that reducing a metacall makes the metalanguage expression smaller, and so the metacall reduction process eventually terminates. At this point, no gray code remains since all metacalls have been eliminated and there is no way other than a metacall to include gray code in an element of Expcps . So all that remains is a black residual FILcps program. Another consequence of the fact that m is referenced at most once in every metacontinuation-accepting function is that there is no specter of duplication-induced code bloat that haunts more general inlining optimizations. Using mc→exp to convert m to a FILcps abstraction named Ikif in the if clause of MCPS exp is essential for avoiding code duplication. We illustrate compile-time beta reductions in Figure 17.28, which shows the CPS conversion of the expression (app f (@* x (if (app g y) 2 3))) relative to an initial continuation named k. The example illustrates how MCPS effectively turns the input expression “inside out.” In the input expression, the call to f is the outermost call, and (app g y) is the innermost call. But in the CPS-converted result, the call to g is the outermost call and the call to f is nested deep inside. This reorganization is necessary to make explicit the order in which operations are performed: 1. first, g is applied to y; 2. then the result t.5 of the g application is tested by if; 3. the test determines which of 2 or 3 will be named t.3 and multiplied by x; 4. then f is invoked on the result t.1 of the multiplication; 5. finally, the result of the f application is supplied to the continuation k. Variables such as t.1, t.3, t.5 can be viewed as registers that hold the results of intermediate computations.
1064
Chapter 17
Compilation
The example assumes that (mc→exp (id→mc k)) can be simplified to k.8 To see why, observe that (mc→exp (id→mc k)) = ((λm . (abs (Itemp ) (m Itemp ))) (λV . (app k V ))) = (abs (Itemp ) ((λV . (app k V )) Itemp )) = (abs (Itemp ) (app k Itemp ))
The final expression can be simplified to k by the [eta] rule. This eta reduction eliminates an abstraction in cases where the CPS transformation would have generated a trivial continuation that simply passed its argument along to another continuation with no additional processing. This simplification is sometimes called the tail-call optimization because it guarantees that tail calls in the source program require no additional control storage in the compiled program. In particular, there is no need to push an invocation frame corresponding to a trivial continuation onto the procedure-call stack. This allows tail calls to compile to assembly code jumps that pass arguments. A language is said to be properly tail recursive if implementations are required to compile source tail calls into jumps. Our FIL mini-language is properly tail recursive, as is the real language Scheme. Such languages can leave out iteration constructs (like while and for loops) and still express the constantcontrol-space iterative computations specified by such constructs using recursive procedures that invoke themselves via tail calls. Figure 17.29 shows the result of using MCPS to CPS-convert our revmap example. Observe that the output of CPS conversion looks much closer to assembly language code than the input (Figure 17.19 on page 1041). You should study the code to convince yourself that this program has the same behavior as the original program. CPS conversion has introduced only one nontrivial continuation abstraction: k.41 names the continuation of the call to f (now called f.3) in the body of the loop. Each input abstraction has been extended with a final argument naming its continuation: revmap (which has been renamed to abs.10) takes continuation argument k.20; the looping procedure (abs.25) takes continuation argument k.29; and the greater-than-b procedure (abs.11) takes continuation k.18. Note that the looping procedure (which is not only named abs.25 but is also named t.26 and t.37 when it is extracted from the first slot of the mutable product t.23) is always invoked with the same continuation as the enclosing abstraction (k.20 when it is named t.26 and k.29 when it is named t.37). So it requires only constant control space and is thus truly iterative like loops in traditional languages. 8
Subsequently, (let ((t.0 k)) (app V1 V2 t.0)) is simplified to (app V1 V2 k) by an application of [copy-prop].
17.9.3 A More Efficient CPS Transformation
1065
(MCPS exp [[(app f (@* x (if (app g y) 2 3)))]] (id→mc k)) = ((λm . (MCPS exp [[f]] (λV1 . (MCPS exp [[(@* x (if (app g y) 2 3))]] (λV2 . (let ((t.0 (mc→exp m))) (app V1 V2 t.0))))))) (id→mc k)) = (MCPS exp [[f]] (λV1 . (MCPS exp [[(@* x (if (app g y) 2 3))]] (λV2 . (let ((t.0 (mc→exp (id→mc k)))) (app V1 V2 t.0)))))) = (MCPS exp [[(@* x (if (app g y) 2 3))]] (λV2 . (app f V2 k))) = ((λm . (MCPS exp [[x]] (λV3 . (MCPS exp [[(if (app g y) 2 3)]] (λV4 . (let ((t.1 (@* V3 V4 ))) (m t.1))))))) (λV2 . (app f V2 k))) = (MCPS exp [[x]] (λV3 . (MCPS exp [[(if (app g y) 2 3)]] (λV4 . (let ((t.1 (@* V3 V4 ))) (app f t.1 k)))))) = (MCPS exp [[(if (app g y) 2 3)]] (λV4 . (let ((t.1 (@* x V4 ))) (app f t.1 k)))) = ((λm . (MCPS exp [[(app g y)]] (λV5 . (let ((kif.2 (mc→exp m))) (if V5 (MCPS exp [[2]] (id→mc kif.2)) (MCPS exp [[3]] (id→mc kif.2))))))) (λV4 . (let ((t.1 (@* x V4 ))) (app f t.1 k)))) = (MCPS exp [[(app g y)]] (λV5 . (let ((kif.2 (abs (t.3) (let ((t.1 (@* x t.3))) (app f t.1 k))))) (if V5 (MCPS exp [[2]] (λV6 . (app kif.2 V6 ))) (MCPS exp [[3]] (λV7 . (app kif.2 V7 ))))))) = ((λm . (MCPS exp [[g]] (λV8 . (MCPS exp [[y]] (λV9 . (let ((t.4 (mc→exp m))) (app V8 V9 t.4))))))) (λV5 . (let ((kif.2 (abs (t.3) (let ((t.1 (@* x t.3))) (app f t.1 k))))) (if V5 (app kif.2 2) (app kif.2 3))))) = (let ((t.4 (abs (t.5) ; (abs (t.5) . . . ) = (mc→exp (λV5 . . . . )) (let ((kif.2 (abs (t.3) (let ((t.1 (@* x t.3))) (app f t.1 k))))) (if t.5 (app kif.2 2) (app kif.2 3)))))) (app g y t.4)) ; substituted g for V8 and y for V9 in (app V8 V9 t.4) Figure 17.28
An example of CPS conversion using metacontinuations.
1066
Chapter 17
Compilation
Note that the looping procedure (abs.25) is the only nontrivial value ever stored in the first slot of the mutable product named t.23, so that all references to this slot (i.e., the values named by t.26 and t.37) denote the looping procedure. In the case of t.26, this fact can be automatically discovered by a simple peephole optimization (a local code optimization that transforms small sequences of instructions) on let* bindings: (let* (. . . (I1 (@mset! J Improd V )) (I2 (@mget J Improd )) . . .) Ebody ) simp − − −− → (let* (. . . (I1 (@mset! J Improd V )) (I2 V ) . . . ) Ebody )
In conjunction with the [copy-prop] simplification, this peephole optimization can justify simplifying (let* ( . . . (t.24 (@mset! 1 t.23 abs.25)) (t.26 (@mget 1 t.23))) (app t.26 elts.4 k.20))
to (let* ( . . . (t.24 (@mset! 1 t.23 abs.25))) (app abs.25 elts.4 k.20))
in the CPS-converted revmap code. A much more sophisticated analysis would be necessary to determine that t.37 denotes the looping procedure. However, even this knowledge cannot be used to replace t.37 by abs.25 because abs.25 is a let-bound variable whose scope does not include the body of the abstraction (abs (xs.7 k.29) . . . ). The conciseness of the code in Figure 17.29 is a combination of the simplifications performed by reducing metacalls at compile time and the standard FILcps simplifications. To underscore the importance of the latter, Figure 17.30 shows the result of MCPS before any FILcps simplifications are performed. Nine applications of the [copy-prop] rule and four applications of the [eta] rule are used to simplify the code in Figure 17.30 to the code in Figure 17.29. In addition to making the code shorter, these simplifications are essential for performing the tail-call optimization. For example, the call (app t.37 t.38 k.40) in Figure 17.30 uses the trivial continuation k.40 = (abs (t.39) (app kif.30 t.39)), which itself uses the trivial continuation kif.30 = (abs (t.43) (app k.29 t.43)). This call is transformed to the new call (app t.37 t.38 k.29) by using two applications of the [eta] rule (simplifying (abs (t.43) (app k.29 t.43)) to k.29 and (abs (t.39) (app kif.30 t.39)) to kif.30) and two applications of the [copy-prop] rule (replacing kif.30 and k.40 by k.29). A drawback of the [copy-prop] simplifications is that they rename some of the identifiers from the input, making it harder for programmers to compare the
17.9.3 A More Efficient CPS Transformation
1067
(fil (a.0 b.1 ktop.9) (let* ((abs.10 (abs (f.3 elts.4 k.20) (let* ((t.22 (@null)) (t.21 (@mprod t.22)) (t.23 (@mprod #u)) (abs.25 (abs (xs.7 k.29) (let ((t.31 (@null? xs.7))) (if t.31 (let ((t.42 (@mget 1 t.21))) (app k.29 t.42)) (let* ((t.34 (@car xs.7)) (k.41 (abs (t.35) (let* ((t.36 (@mget 1 t.21)) (t.33 (@cons t.35 t.36)) (t.32 (@mset! 1 t.21 t.33)) (t.37 (@mget 1 t.23)) (t.38 (@cdr xs.7))) (app t.37 t.38 k.29))))) (app f.3 t.34 k.41)))))) (t.24 (@mset! 1 t.23 abs.25)) (t.26 (@mget 1 t.23))) (app t.26 elts.4 k.20)))) (abs.11 (abs (x.8 k.18) (let ((t.19 (@> x.8 b.1))) (app k.18 t.19)))) (t.14 (@* a.0 7)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13))) (app abs.10 abs.11 t.12 ktop.9))) Figure 17.29
revmap program after metaCPS conversion (with simplifications).
input and output of CPS conversion. In the revmap example, [copy-prop] changes ans.5 to t.21 and loop.6 to t.23 and also replaces two occurrences of _ (by t.32 and t.24). Since techniques to avoid such renamings are complex, and the particular names used don’t affect the correctness of the resulting code, we opt to accept such renamings without complaint.
1068
Chapter 17
Compilation
(fil (a.0 b.1 ktop.9) (let* ((abs.10 (abs (f.3 elts.4 k.20) (let* ((t.22 (@null)) (t.21 (@mprod t.22)) (ans.5 t.21) (t.23 (@mprod #u)) (loop.6 t.23) (abs.25 (abs (xs.7 k.29) (let* ((kif.30 (abs (t.43) (app k.29 t.43))) (t.31 (@null? xs.7))) (if t.31 (let ((t.42 (@mget 1 ans.5))) (app kif.30 t.42)) (let* ((t.34 (@car xs.7)) (k.41 (abs (t.35) (let* ((t.36 (@mget 1 ans.5)) (t.33 (@cons t.35 t.36)) (t.32 (@mset! 1 ans.5 t.33)) (_ t.32) (t.37 (@mget 1 loop.6)) (t.38 (@cdr xs.7)) (k.40 (abs (t.39) (app kif.30 t.39)))) (app t.37 t.38 k.40))))) (app f.3 t.34 k.41)))))) (t.24 (@mset! 1 loop.6 abs.25)) (_ t.24) (t.26 (@mget 1 loop.6)) (k.28 (abs (t.27) (app k.20 t.27)))) (app t.26 elts.4 k.28)))) (revmap.2 abs.10) (abs.11 (abs (x.8 k.18) (let ((t.19 (@> x.8 b.1))) (app k.18 t.19)))) (t.14 (@* a.0 7)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13)) (k.17 (abs (t.16) (app ktop.9 t.16)))) (app revmap.2 abs.11 t.12 k.17))) Figure 17.30
revmap program after metaCPS conversion (without simplifications).
17.9.3 A More Efficient CPS Transformation
1069
Exercise 17.19 Use MCPS exp to CPS-convert the following FIL expressions relative to an initial metacontinuation (id→mc k). a. (abs (f) (@+ 1 (app f 2))) b. (abs (g x) (@+ 1 (app g (@* 2 x)))) c. (abs (f g h x) (app f (app g x) (app h x))) d. (abs (f) (@* (if (app f 1) 2 3) (if (app f 4) 5 6))) Exercise 17.20 Use MCPS prog to CPS-convert the following FIL programs: a. The program Pquad from Figure 17.25 (page 1055). b. (fil (x) (let ((fact (@mprod #u)) (_ (@mset! 1 fact (abs (n) (if (@= n 0) 1 (@* n (app (@mget 1 fact) (@- n 1)))))))) (app (@mget 1 fact) x)))
c. (fil (x) (let ((fib (@mprod #u)) (_ (@mset! 1 fib (abs (n) (if (@ y n) y (throw y))))) (@- (f 5) (catch (abs (z) (@* 2 z)) (f 3)))))) (app Ecatchabs 0) − −IL − → 2 {5 - 3 = 2} F (app Ecatchabs 4) − −IL − → −1 {5 - (2*3) = -1} F (app Ecatchabs 8) − −IL − → 6 {5 + 1 = 6} F
a. Sam modifies the standard SCPS conversion clauses to translate every expression into a procedure taking two continuations: an exception continuation and a normal continuation. Sam’s SCPS conversion clauses for programs, literals, and conditionals are: SCPS prog [[(fil (I n i=1 ) Ebody )]] = (fil (I n i=1 Ikntop ) ; Ikntop fresh (let ((Iketop (abs (Iinfo ) ; Iketop and Iinfo fresh (error uncaught-exception)))) (app (SCPS exp [[Ebody ]]) Iketop Ikntop ))) SCPS exp [[L]] = (abs (Ike Ikn ) ; Ike (exception cont.) and Ikn (normal cont.) fresh (app Ikn L)) SCPS exp [[(if Etest Ethen Eelse )]] = (abs (Ike Ikn ) ; Ike and Ikn fresh (app (SCPS exp [[Etest ]]) Ike (abs (Itest ) ; Itest fresh (if Itest (app (SCPS exp [[Ethen ]]) Ike Ikn ) (app (SCPS exp [[Eelse ]]) Ike Ikn )))))
Write the SCPS exp clauses for abs, app, throw, and catch.
17.10 Transformation 8: Closure Conversion
1075
b. For MCPS, Sam modifies MCPS exp to take an additional argument (an identifier naming the current exception continuation) before the metacontinuation argument. For example: MCPS prog [[(fil (I n i=1 ) Ebody )]] = (fil (I n i=1 Ikntop ) ; Ikntop fresh (let ((Iketop (abs (Iinfo ) ; Iketop and Iinfo fresh (error uncaught-exception)))) (MCPS exp [[Ebody ]] Iketop (id→mc Ikntop )))) MCPS exp : ExpFIL → Ident → MetaCont → Expcps MCPS exp [[L]] = (λIke m . (m L)) MCPS exp [[(if Etest Ethen Eelse )]] = (λIke m . (MCPS exp [[Etest ]] Ike (λVtest . (let ((Ikif (mc→exp m))) ; Ikif fresh (if Vtest (MCPS exp [[Ethen ]] Ike (id→mc Ikif )) (MCPS exp [[Eelse ]] Ike (id→mc Ikif )))))))
Write the MCPS exp clauses for abs, app, throw, and catch. c. Based on the metaCPS conversion of FIL+{throw, catch} explain how to perform metaCPS conversion for FIL+{raise, handle}.
17.10
Transformation 8: Closure Conversion
In a block-structured language, code can refer to variables declared outside the current block (i.e., in an outer procedure or class declaration). As we have seen in Chapters 6–7, the meaning of such free variable references is often explained in terms of environments. Traditional interpreters and compilers have specialpurpose machinery to manage environments. The Tortoise compiler avoids such machinery by making all environments explicit in the intermediate language. Each procedure is transformed into an abstract pair of code and environment, where the code explicitly accesses the environment to retrieve values formerly referenced by free variables. The resulting abstract pair is known as a closure because its code component is closed—i.e., it contains no free variables. The process of transforming all procedures into closures is traditionally called closure conversion. Because it makes all environments explicit, environment conversion is another name for this transformation. Closure conversion transforms a program that may contain higher-order procedures into one that contains only first-order procedures: rather than passing
1076
Chapter 17
Compilation
a procedure as a parameter or returning one as a result, a transformed program passes or returns a closure data structure. This technique is not only useful as a compiler transformation, but programmers may also apply it manually to simulate higher-order procedures in languages that support only first-order procedures (such as C, Pascal, and Ada) or objects with methods (such as SmallTalk, Java, C++, and C#). All one needs is a way to embed a procedure value (or a reference to a procedure) in a data structure (or object). In the Tortoise compiler, closure conversion has the following specification: Preconditions: The input to closure conversion is a valid kernel FIL program. Postconditions: The output of closure conversion is a valid kernel FIL program in which all abstractions are closed. Other properties: If the input program is in FILcps , so is the output program. In the Tortoise compiler, the closure conversion stage follows the renaming and CPS conversion stages, but closure conversion can be performed on any FIL program, even ones that are not uniquely named or in FILcps . The reason that Tortoise performs closure conversion after CPS conversion is so that closure conversion will be performed on the continuation procedures introduced by CPS conversion as well as on the user-defined procedures already in the program. The Tortoise closure conversion specification requires that any FILcps program will be tranformed to another FILcps program, so the output of the closure conversion stage of the compiler is guaranteed to be in FILcps . There are numerous approaches to closure conversion that differ in their representations of environments and closures. We shall focus on one class of representations, flat closures, and then briefly discuss some alternatives.
17.10.1
Flat Closures
Consider the following example: (let ((linear (abs (a b) (abs (x) (@+ (@* a x) b))))) (let ((f (app linear 4 5)) (g (app linear 6 7))) (@+ (app f 8) (app g 9))))
17.10.1 Flat Closures
1077
Given a and b, the linear procedure returns a procedural representation of a line with slope a and y-intercept b. The f and g procedures represent two such lines, each of which is associated with the abstraction (abs (x) . . . ), which has free variables a and b. In the case of f, these variables have the bindings 4 and 5, respectively, while for g they have the bindings 6 and 7. We will convert this example by hand and then develop an automatic closure conversion transformation. One way to represent f and g as closed procedures is shown below: (let ((fgcode (abs (env x) (let ((a (@mget 1 env)) (b (@mget 2 env))) (@+ (@* a x) b)))) (fenv (@mprod 4 5)) (genv (@mprod 6 7))) (let ((fclopair (@mprod fgcode fenv )) (gclopair (@mprod fgcode genv ))) (@+ (app (@mget 1 fclopair ) (@mget 2 fclopair ) 8) (app (@mget 1 gclopair ) (@mget 2 gclopair ) 9))))
In this approach, the two procedures share the same code component, fgcode , which takes an explicit environment argument env in addition to the original argument x. The env argument is assumed to be a tuple (product) whose two components are the values of the former free variables a and b. These values are extracted from the environment and given their former names in a wrapper around the body expression (@+ (@* a x) b). Note that fgcode has no free variables and so is a closed procedure. The environments fenv and genv are tuples holding the free variable values. The closures fclopair and gclopair are formed by making explicit code/environment pairs, each combining the shared code component with a specific environment for the closure. To handle the change in procedure representation, each call of the form (app f E ) must be transformed to (app (@mget 1 fclopair ) (@mget 2 fclopair ) E ) (and similarly for g) in order to pass the environment component as the first argument to the code component. Closure conversion can be viewed as an exercise in abstract data type implementation. The abstraction being considered is the procedure, whose interface has two operations: abs, which creates procedures, and app, which applies procedures. The goal of closure conversion is to find an implementation of this interface that behaves the same, but in which procedure creation requires no free variables. As in traditional data structure problems, we’re keen to design correct implementations that are as efficient as possible.
1078
Chapter 17
Compilation
(let ((linear (@mprod {this closure (clo.1) has only a code component} (abs (clo.1 a b) {the parameter clo.1 is not referenced} (@mprod {this closure (clo.2) has code + vars {a, b}} (abs (clo.2 x) (let ((a (@mget 2 clo.2)) (b (@mget 3 clo.2))) (@+ (@* a x) b))) a b)) {vars used by clo.2 = {a, b}} ))) {clo.1 has no vars} (let ((f (app (@mget 1 linear) linear 4 5)) (g (app (@mget 1 linear) linear 6 7))) (@+ (app (@mget 1 f) f 8) (app (@mget 1 g) g 9)))) Figure 17.33
Result of closure-converting the linear example.
For example, a more efficient approach to using explicit code/environment pairs is to collect the code and free variable values into a single tuple, as shown below: (let ((fgcode (abs (clo x) (let ((a (@mget 2 clo)) (b (@mget 3 clo))) (@+ (@* a x) b))))) (let ((fclo (@mprod fgcode 4 5)) (gclo (@mprod fgcode 6 7))) (@+ (app (@mget 1 fclo ) fclo 8) (app (@mget 1 gclo ) gclo 9))))
This approach, known as closure-passing style, avoids creating a separate environment tuple every time a closure is created, and avoids extracting this tuple from the code/environment pair every time the closure is invoked. If we systematically use closure-passing style to transform every abstraction and application site in the original linear example, we get the result shown in Figure 17.33. The inner abs has been transformed into a tuple that combines fgcode with the values of the free variables a and b from the outer abs. For consistency, the outer abs has also been transformed; its tuple has only a code component since the original abs has no free variables. By convention, we will refer to a closure tuple by the name of the first argument of its code component. In this example, the code comments refer to the outer closure tuple as clo.1 and the inner closure tuple as clo.2.
17.10.1 Flat Closures
1079
Figure 17.34 shows an example involving nested open procedures and unreferenced variables. In the unconverted clotest, the outermost abstraction, (abs (c d) . . .), is closed; the middle abstraction, (abs (r s t) . . .), has c as its only free variable (d is never used); and the innermost abstraction, (abs (y) . . .), has {c, r, t} as its free variables (d and s are never used). In the converted clotest, each abstraction has been transformed into a tuple that combines a closed code component with all the free variables of the original abstraction. The resulting tuples are called flat closures because all the environment information has been condensed into a single tuple that does not reflect any of the original nesting structure. Note that unreferenced variables from an enclosing scope are ignored. For example, the innermost body does not reference d and s, so these variables are not extracted from clo.3 and are not included in the innermost closure tuple. A formal specification of the flat closure conversion transformation is presented in Figure 17.35. The transformation is specified via the CLexp function on FIL expressions. The only nontrivial cases for CLexp are abs and app. CLexp converts an abs to a tuple containing a closed code component and all the free variables of the abstraction. The code component is derived from the original abs by adding a closure argument Iclo and extracting the free variables from this argument in a wrapper around the body. The order of the free variables is irrelevant as long as it is consistent between tuple creation and projection. An app is converted to another app that applies the code component of the converted rator closure to the closure and the converted operands. Certain parts of the CLexp definition are written in a somewhat unnatural way to guarantee that an input expression in FILcps will be translated to an output expression in FILcps . This is the purpose of the separate clause for converting an abs that occurs in a let binding and of the let* bindings in the app and abs conversions. We will ignore these details in our informal examples of closure conversion. Note that the unique naming property is not preserved by CLexp . The names Ifvi declared in the body of the closed abstraction stand for variables that are logically distinct from variables with the same names in the mprod application that creates the closure tuple. Figure 17.36 shows the revmap example after closure conversion. In addition to transforming procedures present in the original code in Figure 17.4 on page 1011 (clo.56 is revmap, clo.52 is loop, and clo.60 is the greater-than-b procedure), closure conversion also transforms the continuation procedures introduced by CPS conversion (clo.48 is the continuation for the f call—compare Figure 17.29 on page 1067). The free variables in converted continuations corre-
1080
Chapter 17
Compilation
Unconverted Expression (let ((clotest (abs (c d) (abs (r s t) (abs (y) (@+ (@/ (@* r y) t) (@- r c))))))) (let ((p (app clotest 4 5))) (let ((q1 (app p 6 7 8)) (q2 (app p 9 10 11))) (@+ (app q1 12) (app q2 13))))) Converted Expression (let ((clotest (@mprod {this closure (clo.1) has only a code component} (abs (clo.1 c d) {the parameter clo.1 is never referenced} (@mprod {this closure (clo.2) has code + var {c}} (abs (clo.2 r s t) (let ((c (@mget 2 clo.2))) (@mprod {this closure (clo.3) has code + vars {c, r, t}} (abs (clo.3 y) (let ((c (@mget 2 clo.3)) (r (@mget 3 clo.3)) (t (@mget 4 clo.3))) (@+ (@/ (@* r y) t) (@- r c)))) c r t))) {vars used by clo.3 = {c, r, t}} c)) {vars used by clo.2 = {c}} ))) {clo.1 has no vars} (let ((p (app (@mget 1 clotest) clotest 4 5))) (let ((q1 (app (@mget 1 p) p 6 7 8)) (q2 (app (@mget 1 p) p 9 10 11))) (@+ (app (@mget 1 q1) q1 12) (app (@mget 1 q2) q2 13))))) Figure 17.34
Flat closure conversion on an example with nested open procedures.
spond to the caller-saved register values that a traditional implementation would save on the stack during a subroutine call that returns to the control point represented by the continuation. In the Tortoise compiler, this saving behavior is automatically implemented by performing closure conversion after CPS conversion, but the saved values are stored in the continuation closure rather than on an explicit stack. For example, continuation closure clo.48 includes the values needed by the loop after a call to f: the cell t.21 resulting from the assignment
17.10.1 Flat Closures
1081
CLexp : ExpFIL → ExpFIL n CLexp [[(abs (I i=1 ) Ebody )]] n ) Ebody )]] = let {Ifv1 , . . . , Ifvk } be FrIds[[(abs (I i=1 ; assume an appropriate definition of FrIds for FIL n ) ; Iclo fresh in (@mprod (abs (Iclo I i=1 (let* ((Ifvj (@mget j + 1 Iclo ))kj=1 ); N [[n]] = n for n ∈ Nat CLexp [[Ebody ]])) Ifv1 . . . Ifvk ) n CLexp [[(let ((Iabs (abs (I i=1 ) Eabsbody ))) Eletbody )]] ; special case of abs conversion that preserves FILcps n = let (@mprod Ecode Ifv1 . . . Ifvk ) be CLexp [[(abs (I i=1 ) Eabsbody )]] in (let* ((Icode Ecode ) ; Icode fresh (Iabs (@mprod Icode Ifv1 . . . Ifvk ))) CLexp [[Eletbody ]])
CLexp [[(app Erator E ni=1 )]] = (let* ((Iclo CLexp [[Erator ]]) ; Iclo fresh (Icode (@mget 1 Iclo ))) ; Icode fresh (app Icode Iclo CLexp [[Ei ]]ni=1 )) CLexp [[E ]] = mapsubFIL [[E ]] CLexp , otherwise. Figure 17.35
The flat closure conversion transformation CLexp .
conversion of ans, the cell holding the looping procedure t.23, the loop state variable xs.7, and the end-of-loop continuation k.29. In Figure 17.36, we assume that the top-level continuation ktop.9 supplied by the operating system is consistent with the calling convention used by the closureconverted code. I.e., ktop.9 must be a closure tuple whose first slot contains an abstraction with two parameters: (1) the closure tuple and (2) the argument expected by the system’s unary continuation procedure. Alternatively, for the case where closure conversion is known to follow CPS conversion, we could define a special program-level closure conversion function CLprog that assumes that the final argument in the input FILcps program is an unconverted unary continuation procedure (see Exercise 17.30). In order to work properly, CLexp requires that the input expression contain no assignments (set!). This is necessarily true in FIL, which does not support set!, but would be an issue in extensions to FIL that include set! (e.g., see Exercises 17.14 and 17.23 in Sections 17.9.2 and 17.9.3). The reason for this restriction is that the copying of free variable values by CLexp in the abs clause
1082
Chapter 17
Compilation
does not preserve the semantics of mutable variables. Consider the following example of a nullary procedure that increments a counter every time it is called in FIL+{set!}: (let ((count 0)) (abs () (let* ((new-count (@+ count 1)) (_ (set! count new-count))) new-count)))
Closure-converting this example yields: (let ((count 0)) (@mprod (abs (clo) (let* ((count (@mget 2 clo))) (let* ((new-count (@+ count 1)) (_ (set! count new-count))) new-count))) count))
The set! in the transformed code changes the local variable count within the abstraction, which is always initially bound to the value 0. So the closure-converted procedure always returns 1, which is not the correct behavior. Performing assignment conversion before closure conversion fixes this problem, since count will then name a sharable mutable cell rather than a number, and the set! will be transformed to an mset! on this cell. The interaction between mutable variables and closure conversion arises in practice in Java. Java’s anonymous inner classes allow the programmer to create an instance of an unnamed class (the inner class) within the method of another class (the outer class). Because it is possible for the inner class instance to refer to parameters and local variables of the enclosing method, the inner class instance is effectively a closure over these variables. For example, Figure 17.37 shows how an inner class can be used to express the linear example from page 1076 in Java. The IntFun interface is a specification for a class providing an app method that takes a single integer argument and returns an integer result. The linear method of the Linear class takes integers a and b and returns an instance of an anonymous class satisfying the IntFun specification whose app method maps an argument x to the result (a*x)+b. This instance corresponds to the first-class procedure (abs (x) (+ (* a x) b)) in FIL. Java requires any enclosing local variables mentioned in the inner class (a and b in this example) to be declared immutable (using the keyword final). This restriction allows the Java compiler to copy the values of these variables into instance variables of the anonymous inner class instance rather than attempting to share the locations of these variables (which would require some form of assignment conversion).
17.10.1 Flat Closures
1083
(fil (a.0 b.1 ktop.9) (let* ((code.57 {code of clo.56} (abs (clo.56 f.3 elts.4 k.20) (let* ((t.22 (@null)) (t.21 (@mprod t.22)) (t.23 (@mprod #u)) (code.53 {code of clo.52} (abs (clo.52 xs.7 k.29) (let* ((t.21 (@mget 2 clo.52)) (t.23 (@mget 3 clo.52)) (f.3 (@mget 4 clo.52)) (t.31 (@null? xs.7))) (if t.31 (let* ((t.42 (@mget 1 t.21)) (code.44 (@mget 1 k.29))) (app code.44 k.29 t.42)) (let* ((t.34 (@car xs.7)) (code.49 {code of clo.48} (abs (clo.48 t.35) (let* ((t.21 (@mget 2 clo.48)) (t.23 (@mget 3 clo.48)) (xs.7 (@mget 4 clo.48)) (k.29 (@mget 5 clo.48)) (t.36 (@mget 1 t.21)) (t.33 (@cons t.35 t.36)) (t.32 (@mset! 1 t.21 t.33)) (t.37 (@mget 1 t.23)) (t.38 (@cdr xs.7)) (code.46 (@mget 1 t.37))) (app code.46 t.37 t.38 k.29)))) (k.41 (@mprod code.49 t.21 t.23 xs.7 k.29)) {clo.48} (code.50 (@mget 1 f.3))) (app code.50 f.3 t.34 k.41)))))) (abs.25 (@mprod code.53 t.21 t.23 f.3)) {clo.52} (t.24 (@mset! 1 t.23 abs.25)) (t.26 (@mget 1 t.23)) (code.54 (@mget 1 t.26))) (app code.54 t.26 elts.4 k.20)))) (abs.10 (@mprod code.57)) {clo.56} (code.61 (abs (clo.60 x.8 k.18) {code of clo.60} (let* ((b.1 (@mget 2 clo.60)) (t.19 (@> x.8 b.1)) (code.58 (@mget 1 k.18))) (app code.58 k.18 t.19)))) (abs.11 (@mprod code.61 b.1)) {clo.60} (t.14 (@* a.0 7)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13)) (code.62 (@mget 1 abs.10))) (app code.62 abs.10 abs.11 t.12 ktop.9)))
Figure 17.36
revmap program after closure conversion.
1084
Chapter 17
Compilation
Exercise 17.28 a. A function f is idempotent iff (f (f x)) = (f x) for all x ∈ dom(f ). CLexp is not idempotent. Explain why. Can any closure conversion transformation be idempotent? n b. In the abs clause for CLexp , suppose FrIds[[(abs (I i=1 ) Ebody )]] is replaced by the set of all variables in scope at that point. Is this a meaning-preserving change? What are the advantages and disadvantages of such a change?
c. In a FIL-based compiler, CLexp must necessarily be performed after an assignment conversion pass. Could we perform it before a renaming pass? A globalization pass? A CPS-conversion pass? Explain. Exercise 17.29 In the abs clause, the CLexp function uses a wrapping strategy to wrap the body of the original abs in a let* that extracts and names each free variable value in the closure. An alternative substitution strategy is to replace each free reference in the original abs by a closure access. Here is a modified version of the clo.2 code component from Figure 17.33 that uses the substitution strategy: (abs (clo.2 x) (@+ (@* (@mget 2 clo.2) x) (@mget 3 clo.2)))
Neither strategy is best in all situations. Describe situations in which the wrapping strategy is superior and in which the substitution strategy is superior. State all of the assumptions of your argument. Exercise 17.30 a. Define a program-level closure conversion function CLprog that expects a FILcps program: CLprog : Progcps → Progcps
In both the input and output programs, the final program argument Iktop is expected to be the top-level unary continuation procedure. CLprog must handle Iktop specially so that it is applied directly to its single argument rather than via the closure application convention. It is not necessary to modify CLexp . b. Show the result of using your CLprog function to closure-convert the following program: (fil (a b ktop) (let ((add-a (abs (x k) (let ((t (@+ x a))) (app k t))))) (if b (app add-a a ktop) (app ktop a))))
Exercise 17.31 Using anonymous inner classes, complete the following translation of the clotest example from Figure 17.34 into Java by filling in the hole in the following code with a single Java expression:
17.10.2 Variations on Flat Closure Conversion
1085
interface IntFun { public int app (int x); } public class Linear { public static IntFun linear (final int a, final int b) { return new IntFun() { public int app (int x) {return (a*x)+b;} }; } public static int example () { IntFun f = linear(4,5); IntFun g = linear(6,7); return f.app(8) + g.app(9); } } Figure 17.37 Using anonymous inner classes to express the linear example from page 1076 in Java.
interface IntFun1 { public int app (int x); } interface IntFun2 { public IntFun3 app (int x, int y); } interface IntFun3 { public IntFun1 app (int x, int y, int z); } public class Clotest { public static int example () { IntFun2 clotest = 2; IntFun3 p = clotest.app(4,5); IntFun1 q1 = p.app(6,7,8); IntFun1 q2 = p.app(9,10,11); return q1.app(12) + q2.app(13); } }
17.10.2
Variations on Flat Closure Conversion
Now we consider several variations on flat closure conversion. We begin with an optimization to CLexp . Why does CLexp transform an already closed abs into a closure tuple? This strategy simplifies the transformation by enabling all procedure applications to be transformed uniformly to “expect” such a tuple. But it is also possible to use nonuniform transformations on abstractions and applications as long as the correct behavior is maintained. Given a control flow analysis (see page 995) that indicates which procedures flow to which call sites (application expressions that use the procedures in their rator positions), we can do a better job via so-called selective closure conversion [WS94].
1086
Chapter 17
Compilation
(let ((linear (abs (a b) {this closed abstraction is not transformed} (@mprod {this is the closure tuple for an open abstraction} (abs (clo.2 x) (let* ((a (@mget 2 clo.2)) (b (@mget 3 clo.2))) (@+ (@* a x) b))) a b)))) {free vars of clo.2} (let ((f (app linear 4 5)) {this application is not transformed} (g (app linear 6 7))) {this application is not transformed} (@+ (app (@mget 1 f) f 8) (app (@mget 1 g) g 9)))) Figure 17.38
Result of selective closure conversion in the linear example.
In this approach, originally closed procedures that flow only to call sites where only originally closed procedures are called are left unchanged by the closure conversion process, as are their call sites. This avoids unnecessary tuple creation and projection. The result of selective closure conversion for the linear example is presented in Figure 17.38 (compare Figure 17.33 on page 1078). Because the linear procedure is closed, its abstraction and the calls to linear are not transformed. But the procedure returned by invoking linear has free variables (a and b), and so must be converted to a closure tuple. In selective closure conversion, a closed procedure pclosed cannot be optimized when it is called at the same call site s as an open procedure popen in the original program. The call site must be transformed to expect for its rator a closure tuple for popen , and so pclosed must also be represented as a closure tuple since it flows to the rator position of s. This representation constraint can similarly force other closed procedures that share call sites with pclosed to be converted, leading to a contagious phenomenon called representation pollution [DWM+ 01]. In the following example, although f is closed, selective closure conversion must still convert f to a closure tuple because it flows to the same call site (app (if b f g) 3) as the open procedure g: Epolluted = (abs (b c) (let ((f (abs (x) (@+ x 1))) (g (let ((a (if b 4 5))) (abs (y) (@+ (@* a y) c))))) (@+ (app f 2) (app (if b f g) 3))))
17.10.2 Variations on Flat Closure Conversion
1087
Representation pollution can sometimes be avoided by duplicating a closed procedure and using different representations for the two copies. For instance, if we split f in Epolluted into two copies, then the copy that flows to the call site (app f 2) need not be converted to a tuple in the closure-converted code: (abs (b c) {assume the outer abstraction need not be converted to a tuple} (let ((f1 (abs (x) (@+ x 1))) {this copy is not converted to a tuple} (f2 (@mprod (abs (clo.1 x) (@+ x 1)))) {this copy is converted to a tuple} (g (let ((a (if b 4 5))) (@mprod (abs (clo.2 y) {this must be converted to a tuple} (let ((a (@mget 1 clo.2)) (c (@mget 2 clo.2))) (@+ (@* a y) c))) a c)))) (@+ (app f1 2) {this is an unconverted call site} (let ((clo.3 (if b f2 g))) {this is a converted call site} (app (@mget 1 clo.3) clo.3 3)))))
When closed and open procedures flow to the same call site (e.g., f2 and g above), we can force the closed procedure to have the same representation as the open one (i.e., a closure tuple). Another way to handle heterogeneous procedure representations is to affix tags to procedures to indicate their representation. Call sites where different representations flow together perform a dynamic dispatch on the tagged value. For example, using the oneof notation introduced in Section 10.2, we can use code to tag a closed procedure and closure to tag a closure tuple, as in the following conversion of Epolluted : (abs (b c) {assume the outer abstraction need not be converted to a tuple} (let ((f1 (abs (x) (@+ x 1))) {this copy is not converted to a tuple} (f2 (one code (abs (x) (@+ x 1)))) {tagged as a closed procedure} (g (let ((a (if b 4 5))) (one closure {tagged as a closure} (@mprod (abs (clo y) (let ((a (@mget 2 clo)) (c (@mget 3 clo))) (@+ (@* a y) c))) a c))))) (@+ (app f1 2) {this is an unconverted call site} (app-generic (if b f2 g) 3))))
Here, (app-generic Erator E ni=1 ) is assumed to desugar to n (let ((Ii Ei )ni=1 ) ; I i=1 are fresh (tagcase Erator Irator n (code (app Irator I i=1 )) n )))) (closure (app (@mget 1 Irator ) Irator I i=1
1088
Chapter 17
Compilation
This tagging strategy is not necessarily a good idea. Analyzing and converting programs to handle tags is complex, and the overhead of tag manipulation can offset the gains made by reducing representation pollution [DWM+ 01]. In an extreme version of the tagging strategy, all procedures that flow to a given call site are viewed as members of a sum-of-products data type. Each element in this data type is a tagged environment tuple. The tag indicates which abstraction created the procedure, and the environment tuple holds the free variable values of the procedure. A procedure call can then be converted to a dispatch on the environment tag that calls an associated closed procedure. Using this strategy on Epolluted yields (abs (b c) (let ((fcode (abs (x) (@+ x 1))) {code for f} (fenv (one abs1 (@mprod))) {tagged environment for f} (gcode (abs (y a c) (@+ (@* a y) c))) {code for g} (genv (let ((a (if b 4 5))) (one abs2 (@mprod a c))))) {tagged environment for g} (@+ (app fcode 2) (app-env (if b fenv genv) 3))))
where (app-env Eenv Erand ) is an abbreviation for (let ((Irand Erand )) (tagcase Eenv Ienv (abs1 (app fcode Irand )) (abs2 (app gcode Irand (@mget 1 Ienv ) (@mget 2 Ienv )))))
The procedure call overhead in the dispatch can often be reduced by an inlining process that replaces some calls by appropriately rewritten copies of their bodies. E.g., app-env could be rewritten as (let ((Irand E1 )) (tagcase Eenv Ienv (abs1 (@+ Irand 1)) (abs2 (@+ (@* (@mget 1 Ienv ) Irand ) (@mget 2 Ienv )))))
This example uses only a single app-env procedure, but in the worst case a different environment application procedure might be needed at every call site. This environment-tagging strategy is known as defunctionalization [Rey72] because it removes all higher-order functions from a program. Defunctionalization is an important closure conversion technique for languages (such as Ada and Pascal) in which function pointers cannot be stored in data structures — a feature required in all the previous techniques we have studied. Some drawbacks of defunctionalization are that it requires the whole program (it cannot be performed on individual modules) and that environment application procedures like app-env might need to dispatch on all abstractions in the entire program. In
17.10.2 Variations on Flat Closure Conversion
1089
practice, type and control flow information can be used to significantly narrow the set of abstractions that need to be considered at a given call site, making defunctionalization a surprisingly efficient approach to closure conversion [CJW00]. A closure need not carry with it the value of a free variable if that variable is available in all contexts where the closure is invoked. This observation is the key idea behind so-called lightweight closure conversion [WS94, SW97], which can decrease the number of free variables in a procedure by adding extra arguments to the procedure if those arguments are always dynamically available at all call sites for the procedure. In our example, the lightweight optimization is realized by rewriting the Epolluted as follows before performing other closure conversion techniques: (abs (b c) (let ((f (abs (x c) (@+ x 1))) {(3) By 2, need param c here.} (g (let ((a (if b 4 5))) (abs (y c) (@+ (@* a y) c))))) {(1) Add c as param.} (@+ (app f 2 c) {(4) By 3, must add c as an arg here, too.} (app (if b f g) 3 c)))) {(2) By 1, need arg c here.}
Since g’s free variable c is available at the one site where g is called, we should be able to pass it as an argument at the site rather than storing it in the closure for g. But representation constraints also force us to add c as an argument to f, since f shares a call site with g. If f were called in some context outside the scope of c, this fact would invalidate the proposed optimization. This example only hints at the sophistication of the analysis required to perform lightweight closure conversion in practice. Exercise 17.32 Consider the following FIL abstraction Eabs : (abs (b) (let ((f (abs (x) (@+ x 1))) (g (abs (y) (@* y 2))) (h (abs (a) (abs (z) (@/ z a)))) (p (abs (r) (app r 3)))) (@+ (app (if b f g) 4) (@* (app p (app h 5)) (app p (app h 6))))))
a. Show the result of applying flat closure conversion to Eabs . b. The transformation can be improved if we use selective closure conversion instead. Show the result of selective closure conversion on Eabs . c. Suppose we replace (app h 6) by g in Eabs to give Eabs . Then selective closure conversion on Eabs does not yield an improvement over regular closure conversion on Explain why. Eabs d. Describe a simple meaning-preserving change to Eabs after which selective closure conversion will be an improvement over regular closure conversion.
1090
Chapter 17
Compilation
Exercise 17.33 Using the flat closure conversion techniques presented so far, translate the following FIL program into C, Java, and Pascal. The program has the property that equality, remainder, division, and subtraction operations are performed only when p is called, not when q is called. Your translated programs should also have this property (fil (n) (let* ((p (abs (w) (if (@= 0 w) (abs (x) x) (if (@= 0 (@% w 2)) (let ((p1 (p (@/ (abs (y) (@* 2 (let ((p2 (p (@(abs (z) (@+ 1 (let ((q (app p n))) (@+ (app q 1) (app q n)))))
17.10.3
w 2)))) (app p1 y)))) w 1)))) (app p2 z)))))))))
Linked Environments
Thus far we have assumed that all free variable values of a procedure are stored in a single flat environment or closure. This strategy minimizes the information carried in a particular closure. However, it is often the case that a free variable is referenced by several closures. Setting aside a slot for (a pointer to) the value of this variable in several closures/environments increases the space requirements of the program. For example, in the flat clotest example of Figure 17.34, closures p, q1, and q2 all contain a slot for the value of free variable c. An alternative approach is to structure closures to enhance sharing and reduce copying. In a code/environment model, a high degree of sharing is achieved when every call site bundles the environment of the called procedure (the parent environment) together with the argument values to create the environment for the body of the called procedure. In this approach, each closed abstraction takes a single argument, its environment, and all variables are accessed through this environment. This is called a linked environment approach because environments are represented as chains of linked components called frames. Figure 17.39 shows this approach for the clotest example. Note that the first slot of environment frames env1, env2, and env3 contains (a pointer to) its parent frame. Variables declared by the closest enclosing abs are accessed directly from the current frame, but variables declared in outer abstractions require one or more indirections through parent frames. For instance, in the body of the innermost abs, variable y, which is the first argument of the current frame, env3, is accessed via (@mget 2 env3); the variable r, which is the first argument one frame back, is accessed via (@mget 2 (@mget 1 env3)); and the variable c, which is the first argument two frames back, is accessed via (@mget 2 (@mget 1 (@mget 1 env3))).
17.10.3 Linked Environments
1091
(let ((clotest (@mprod (abs (env1) {env1 = env0, c, d} (@mprod (abs (env2) {env2 = env1, r, s, t} (@mprod (abs (env3) {env3 = env2, y} (@+ (@/ (@* (@mget 2 (@mget 1 env3)) {get r} (@mget 2 env3)) {get y} (@mget 4 (@mget 1 env3))) {get t} (@- (@mget 2 (@mget 1 env3)) {get r} (@mget 2 (@mget 1 (@mget 1 env3)))))) {get c} env2)) env1)) (@mprod)))) {This is env0 = the empty environment} (let ((p (app (@mget 1 clotest) (@mprod (@mget 2 clotest) 4 5)))) (let ((q1 (app (@mget 1 p) (@mprod (@mget 2 p) 6 7 8))) (q2 (app (@mget 1 p) (@mprod (@mget 2 p) 9 10 11)))) (@+ (app (@mget 1 q1) (@mprod (@mget 2 q1) 12)) (app (@mget 1 q2) (@mprod (@mget 2 q2) 13)))))) Figure 17.39
A version of the clotest example with linked environments.
In general, each variable has a lexical address back , over , where back indicates how many frames back the variable is located and over indicates its position in the resulting frame. A variable with lexical address b, o9 is translated to (@mget o (@mgetb 1 e)), where e is the current lexical environment frame and (@mgetb 1 e) stands for the b-fold composition of the first projection starting with e. Traditional compilers often use such lexical addresses to locate variables on a stack, where so-called static links are used to model chains of frames stored on the stack. Linked environments are also commonly used in interpreters for block-structured languages. For example, the environment model interpreter in [ASS96] represents procedures as closures whose environments are linked frames. Figure 17.40 depicts the shared environment structure in the clotest example with linked environments. Note how the environment of p is shared as the parent environment of q1’s environment and q2’s environment. In contrast with the flat environment case, p, q1, and q2 all share the same slot holding c, so less slot space is needed for c. Another advantage of sharing is that the linked environment approach to closure conversion can support set! directly without the need for assignment conversion (see Exercise 17.35). 9
Assume that back indices b start at 0 and over indices o start at 2.
1092
Chapter 17
Compilation
However, there are several downsides to linked environments. First, variable access is slower than for flat closures because of the indirections through parent environment links. Second, environment slots hold values (such as d and s) that are never referenced, so space is wasted on these slots. A final subtle point is that shared slots can hold onto values longer than they are actually needed by a program, leading to space leaks in the storage manager (see Section 18.1). Some of these points and some alternative linked strategies are explored in the exercises. Exercise 17.34 a. In the context of closure-converting the following FIL expression, discuss the issues involved in converting let expressions in the linked environment approach described above: (abs (a) (let ((b (@+ a 1)) (c (@* a a))) (let ((f (abs (d) (@+ a (@* c d))))) (@mprod (app f b) (app f c)))))
let-bound names (such as b and f) that do not appear as the free variables of an abstraction should not be put in environment frames. b. Formally define a closure conversion transformation on FIL expressions that implements the linked environment approach. Do not worry about preserving the CPS form of input programs. Exercise 17.35 Use the linked environment approach to closure-convert the following FIL+{set!} expression. A set! in the input expression should be converted to an mset! on an environment tuple in the converted expression. (let ((r (abs (x) (abs (y) (let ((z (@+ x y))) (let ((_ (set! x z))) z)))))) (let ((s1 (app r 1)) (s2 (app r 2))) (@mprod (app s1 3) (app s2 4) (app s1 5))))
Exercise 17.36 The linked environment approach illustrated by the code in Figure 17.39 constructs a mutable tuple representing an environment frame at every call site. An alternative approach, which we shall call the code/linked-env representation, is to construct new linked environment frames only when building the closure tuple. This way, the procedure calling convention looks exactly like that for code/environment pairs; the only difference from the code/environment approach studied earlier is that the environments are not flat but are composed of linked frames.
17.10.3 Linked Environments
1093 clotest:
•
(abs (env1) . . .)
p:
4 c
5 d
(abs (env2) . . .)
q1:
6 r
(abs (env3) . . .)
Figure 17.40
12 y
7 s
8 t
q2:
(abs (env3) . . .)
9 r
10 s
11 t
13 y
Depiction of the links in the linked clotest example.
a. Show the code/linked-env approach for the clotest example by fleshing out the hole in the following code: (let ((clotest 2)) (let ((p (app (@mget 1 clotest) (@mget 2 clotest) 4 5))) (let ((q1 (app (@mget 1 p) (@mget 2 p) 6 7 8)) (q2 (app (@mget 1 p) (@mget 2 p) 9 10 11))) (@+ (app (@mget 1 q1) (@mget 2 q1) 12) (app (@mget 1 q2) (@mget 2 q2) 13)))))
b. Compare the code/linked-env approach with the linked environment approach discussed in the text on the following points: number of tuples created, efficiency of accessing variables, omitting variables from environment frames, converting let expressions, and handling set!. c. Formally define a closure conversion transformation on FIL expressions that implements the code/linked-env strategy. Do not worry about preserving the CPS form of input programs.
1094
17.11
Chapter 17
Compilation
Transformation 9: Lifting
Programmers nest procedures when an inner procedure needs to use variables that are declared in an outer procedure. The free variables in such an inner procedure are bound by the outer procedure. We have seen that closure conversion eliminates free variables in every procedure. However, because it leaves abstractions in place, it does not eliminate procedure nesting. A procedure is global when it is declared at top level — i.e., in the outermost scope of a program. Lifting (also called lambda lifting10 ) is the process of eliminating procedure nesting by collecting all procedure abstractions and declaring them as global procedures. All procedure abstractions must be closed before lifting is performed — otherwise, lifting them to top level would break the fundamental connection between free variable references and their associated declarations. Once all of the procedures in a program are declared at top level, each one can be compiled into straight-line code (modulo branches for any if expressions in its body) and given a global name.11 In the analogy with assembly code, such a name corresponds to an assembly code label for the first instruction in the subroutine corresponding to the procedure. In the Tortoise compiler, the result of the lifting phase is a program in FILlift (Figure 17.41), a variant of the FILcps language. The key difference between FILlift and FILcps is that abstractions may appear only at the top level of a program in new declaration constructs having the form (def S AB ), where AB is an abstraction and S is special kind of identifier called a subroutine name. Each subroutine name subr0, subr1, subr2, . . . is the concatenation of the name subr and a natural number literal. For n ∈ Nat, we use both subrn and subrn to stand for the result of concatenating the name subr with the digits of the numeral for n. E.g., subr17 = subr17 = subr17. The definition of Proglift requires that subr0 be used for the first subroutine, subr1 be used for the second subroutine, etc. This requirement makes it possible to refer to procedures by number rather than by name. Every subroutine name is a legal identifier and so may be used as a variable reference elsewhere in a program. As in 10 In the literature on compiling functional programming languages (e.g., [Joh85, Pey87]), “lambda lifting” often refers to a process that not only lifts all functions to top level, but also serves as a closure conversion transformation in which closures are represented as partially applied curried functions. 11 It is possible to compile a procedure with nested internal procedures directly to assembly code by placing unconditional branch instructions around the code for the internal procedures. Avoiding unnecessary unconditional branches is important for modern processors with instruction caches, instruction prefetching, and pipelined architectures.
17.11 Transformation 9: Lifting
1095
∗ P ∈ Proglift ::= (fil (Iformal ) Ebody (def subri AB i )ni=0 ) ∗ ) Ebody ) AB ∈ Abstractionlift ::= (abs (Iformal ∗ ) | (if Vtest Ethen Eelse ) E ∈ Explift ::= (app Irator Vrand | (let ((Iname LE defn )) Ebody ) | (error Ymessage )
V ∈ ValueExplift ::= L | I ∗ LE ∈ LetableExplift ::= L | (prim Oprimop Varg )
L ∈ Lit = as in full FIL Y ∈ SymLit = as in full FIL O ∈ Primop = as in full FIL Keywordlift = {abs, app, def, error, fil, if, let, let*, prim, sym} I ∈ Identlift = SymLit − {Y | Y begins with @} ∪ Keywordlift NT ∈ NatLit = {0, 1, 2, . . .} S ∈ Subr = identifiers of the form subrn ; subrn is shorthand for subrn ; For n ∈ Nat, the notation I n stands for the identifier that ; results from concatenating the characters of the name I with ; the digit characters of the numeral in NatLit that denotes n. Figure 17.41
Grammar for FILlift , the result of the Tortoise lifting stage.
other FL-like languages we have studied, the names declared by def have global scope — they may be referenced in all expressions in the program, including the bodies of all def declarations. The Tortoise lifting transformation LI prog has the following specification: Preconditions: The input to LI prog is a valid kernel FILcps program in which every abstraction is closed. Postconditions: The output of LI prog is a program in which every abstraction is globally defined via def at the top level of a program, as specified in the FILlift grammar in Figure 17.41. The free identifiers in each abstraction must be a subset of the subroutine names bound by defs in the program. Although abstractions are required to be closed before lifting, abstractions after lifting are not necessarily closed. This is because each nested abstraction is replaced by a def-bound subroutine name that is necessarily free in the immediately enclosing abstraction. But the def-bound subroutine names are the only names that can be free in the abstractions that result from the lifting transformation.
1096
Chapter 17
Compilation
Here is a sketch of the algorithm employed by LI prog for a program containing n abstractions: 1. Associate with each abstraction AB i (0 ≤ i ≤ n) in the program the subroutine name subri . 2. Replace the abstraction AB i in the program by a reference to its associated name, subri . 3. Return a program of the form ∗ (fil (Ifml ) Ebody (def subr0 AB 0 ) . . . (def subrn AB n ))
where AB 0 , . . ., AB n are the transformed versions of all the abstractions AB 0 , . . ., AB n in the original program, and Ebody is the transformed body. For example, Figure 17.42 shows the revmap example after lambda lifting. subr0 is the code for the revmap procedure, subr1 is the code for the loop procedure, subr2 is the code for the continuation of the call to f within the body of the loop procedure, and subr3 is the code for the greater-than-b procedure. The example shows how replacing each abstraction with its unique subroutine name can introduce free variables into otherwise closed abstractions. For instance, the body of the abstraction named subr0 contains a reference to subr1 and the body of the abstraction named subr1 contains a reference to subr2. In the revmap example, code.62 always denotes the subroutine named subr0, code.46 and code.54 always denote subr1, code.58 always denotes subr2, and code.50 always denotes subr3. In all these cases, it would be safe to replace these code references (and eliminate their associated (mget 1) operations) by the subroutine names. In assembly code, this optimization corresponds to replacing an indirect jump to a subroutine by a direct jump. It is possible for the compiler to perform this optimization automatically, but a sophisticated analysis that tracks control-flow and store-effect information would be required to determine when the optimization can be safely applied. Exercise 17.37 Formally define the LI prog function sketched above. You will also need to define appropriate functions on other FILcps syntactic domains. For simplicity, you may assume that fresh subroutine names are generated in the order subr0, subr1, . . .; i.e., you need not thread a subroutine name counter through your functions.
17.11 Transformation 9: Lifting (fil (a.0 b.1 ktop.9) (let* ((abs.10 (@mprod subr0)) (abs.11 (@mprod subr3 b.1)) (t.14 (@* a.0 7)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13)) (code.62 (@mget 1 abs.10))) (app code.62 abs.10 abs.11 t.12 ktop.9)) (def subr0 (abs (clo.56 f.3 elts.4 k.20) (let* ((t.22 (@null)) (t.21 (@mprod t.22)) (t.23 (@mprod #u)) (abs.25 (@mprod subr1 t.21 t.23 f.3)) (t.24 (@mset! 1 t.23 abs.25)) (t.26 (@mget 1 t.23)) (code.54 (@mget 1 t.26))) (app code.54 t.26 elts.4 k.20)))) (def subr1 (abs (clo.52 xs.7 k.29) (let* ((t.21 (@mget 2 clo.52)) (t.23 (@mget 3 clo.52)) (f.3 (@mget 4 clo.52)) (t.31 (@null? xs.7))) (if t.31 (let* ((t.42 (@mget 1 t.21)) (code.44 (@mget 1 k.29))) (app code.44 k.29 t.42)) (let* ((t.34 (@car xs.7)) (k.41 (@mprod subr2 t.21 t.23 xs.7 k.29)) (code.50 (@mget 1 f.3))) (app code.50 f.3 t.34 k.41)))))) (def subr2 (abs (clo.48 t.35) (let* ((t.21 (@mget 2 clo.48)) (t.23 (@mget 3 clo.48)) (xs.7 (@mget 4 clo.48)) (k.29 (@mget 5 clo.48)) (t.36 (@mget 1 t.21)) (t.33 (@cons t.35 t.36)) (t.32 (@mset! 1 t.21 t.33)) (t.37 (@mget 1 t.23)) (t.38 (@cdr xs.7)) (code.46 (@mget 1 t.37))) (app code.46 t.37 t.38 k.29)))) (def subr3 (abs (clo.60 x.8 k.18) (let* ((b.1 (@mget 2 clo.60)) (t.19 (@> x.8 b.1)) (code.58 (@mget 1 k.18))) (app code.58 k.18 t.19)))))
Figure 17.42
revmap program after lambda lifting.
1097
1098
17.12
Chapter 17
Compilation
Transformation 10: Register Allocation
The goal of the Tortoise compiler is to translate high-level programs to code that can be executed on a register machine. A register machine provides two kinds of storage locations for values: a small number of registers with fast access times and a large number of memory locations with slow access times. It typically has instructions for loading values into registers from memory, storing values from registers to memory, and performing operations whose arguments and results are in registers. The code generated by the Lifting stage of the Tortoise compiler resembles assembly code for a register machine except for its handling of variable names. Intuitively, each identifier in a FILlift program that is not a subroutine name can be viewed as an abstract register. Because fresh identifiers are introduced by many transformations, there is no bound on the number of abstract registers that a program may use. But any register machine executing the program employs a relatively small number of actual registers. The process of mapping the abstract registers of a program to the actual registers of a register machine is known as register allocation. Register allocation makes the storage locations represented by variable names explicit. Tortoise also uses registers to pass procedure arguments, so register allocation makes the argument-passing mechanism explicit. We will study a simple approach to register allocation in the context of transforming FILlift to FILreg , the target language of the Tortoise compiler. In Section 17.12.1, we describe FILreg and explain how to view it as the instruction set for a register machine. We then describe how to convert FILlift to FILreg in Sections 17.12.2–17.12.5.
17.12.1
The FILreg Language
FILreg (Figure 17.43) is a language that is designed to be viewed in two very different ways: 1. FILreg is basically a restricted subset of FILlift . A FILreg program can be executed like any other FILlift program. 2. FILreg is the instruction set for a simple register machine. This machine, FRM, is discussed in Section 18.2. Remarkably, FILreg programs have the same behavior whether we view them as FILlift programs or as register machine programs. This section summarizes
17.12.1 The FILreg Language
1099
the features of the syntax of FILreg and describes how to view FILreg programs and expressions in terms of the underlying register machine operations they are intended to represent. Later (Section 18.2) we will sketch how FILreg programs are executed on the FRM register machine. (A full description of FRM program execution can be found in the Web Supplement.) The general identifiers of FILlift have been replaced by a restricted domain Identreg containing only (1) subroutine names S (as in FILlift ) and (2) register names R (new in FILreg ). Each register name r0, r1, r2, . . . is the concatenation of the name r and a numeral for a natural number between 0 and nmax , where nmax + 1 is the number nreg of registers in the machine. For n ∈ Nat, we use rn to stand for rn. In FILreg , the formal parameter sequences of programs and abstractions and the operand sequences of applications must be prefixes RS of the register sequence [r0, r1, r2, . . . , rnmax ]. That is, abstractions and applications must have the following form: Number of params/rands 0 1 2 3 .. .
Abstraction
Application
(abs () E ) (abs (r0) E ) (abs (r0 r1) E ) (abs (r0 r1 r2) E ) .. .
(app I ) (app I r0) (app I r0 r1) (app I r0 r1 r2) .. .
These restricted forms represent a decision to pass program and procedure arguments in specific registers: the first argument is always passed in register r0, the second argument is always passed in register r1, etc. An abstraction definition (def S (abs (RS ) E )) represents the entry point to a subroutine; an application (app S RS ) represents a direct jump to the subroutine labeled S ; and an application of the form (app R RS ) is an indirect jump to the subroutine whose label (address) is stored in register R. From the register machine’s point of view, the formal parameter names and argument names are superfluous: The arguments are in the registers and both the caller and the callee know how many arguments there are. The names appear in the syntax so that we can continue to interpret our code from the FIL perspective as well. In FILreg , all if tests must be register names. (if R Ethen Eelse ) is thus an instruction that tests the content of register R and continues with the instructions in Ethen if R contains true and with the instructions in Eelse if R contains false. The FILreg expression (error Ymsg ) terminates program execution in an error state that includes the error message Ymsg . The new (halt NT R) ex-
1100
Chapter 17
Compilation
pression terminates program execution with a return code specified by NT ; for some return codes, the result of the program is the value in register R. This expression is used in the register-machine implementation of FILreg (see the Web Supplement for details). The FILreg expression (let ((Rdst LE )) E ) loads the value of LE into the destination register Rdst and then proceeds with the instructions in E . The nature of this load depends on the structure of the letable expression LE : • The case where LE is a literal corresponds to loading the literal value into Rdst . ∗ ) corresponds to • The case where LE is a primitive application (prim Oop Rsrc ∗ and code that performs an operation on the contents of the source registers Rsrc stores the result in the destination register Rdst . Note that the operand registers of primitive applications, unlike those of procedure applications, needn’t be a specific sequence, because register machines let you specify arbitrary registers for primitive operations.
• The case where LE is an application (prim copy Rsrc ) of the new primitive operator copy, which acts as an identity, represents code that copies the content of register Rsrc to the register Rdst . This cannot be accomplished by just having a register Rsrc as the letable expression, because the [copy-prop] rule will always eliminate a let expression of the form (let ((R1 R2 )) E ) by substituting R2 for R1 in E . • The case where LE is the new letable expression (addr S ) represents a load of a subroutine address into Rdst . This cannot be accomplished by just using a subroutine name S as the letable expression, because the [copy-prop] rule will always eliminate a let expression of the form (let ((R S )) E ) by substituting S for R in E . This case is slightly different from the copy case above: addr, which acts like the identity on subroutine names, cannot be a primitive operator, because all prim operands must be register names, and S is not a register name. Registers are used to store values that are needed later in the computation. Sometimes the number of values needed by the rest of the computation exceeds the number of registers. In this case, the extra values must be stored in the register machine’s memory, a process known as spilling. The spget and spset! primitives are used for spilling. They are explained in Section 17.12.5. To model the simplicity of register operations and facilitate spilling, all FILreg primitive operators take zero, one, or two arguments. The one primitive in pre-
17.12.1 The FILreg Language
1101
Kernel Grammar P ∈ Progreg ::= (fil (RS formals ) Ebody (def subri AB i )ni=0 ) AB ∈ Abstractionreg ::= (abs (RS formals ) Ebody ) E ∈ Expreg ::= (app Irator RS rands ) | (if Rtest Ethen Eelse ) | (let ((Rdst LE defn )) Ebody ) | (error Ymessage ) | (halt NT returnCode Rresult ) I ∈ Identreg ::= R | S ∗ LE ∈ LetableExpreg ::= L | (addr S ) | (prim Oprimop Rsrc ) L ∈ Lit = as in full FIL Y ∈ SymLit = as in full FIL O ∈ Primopreg ::= | | |
. . . FIL primops except mprod . . . copy ; register copy (mnew NT ) ; mutable tuple allocation (spget NT ) | (spset! NT ) ; spill get and set
NT ∈ NatLit = {0, 1, 2, . . .} R ∈ Reg = {r0, r1, . . . , rnmax } ; rn is shorthand for rn RS ∈ RegSeq = any prefix of [r0, r1, . . . , rnmax ] ; nmax + 1 = nreg S ∈ Subr = identifiers of the form subrn ; subrn is shorthand for subrn ; For n ∈ Nat, the notation I n stands for the identifier that ; results from concatenating the characters of the name I with ; the digit characters of the numeral in NatLit that denotes n. New Syntactic Sugar (@mnew NT ) ds (prim (mnew NT )) (@spget NT ) ds (prim (spget NT )) (@spset! NT R) ds (prim (spset! NT ) R) Figure 17.43 Grammar for FILreg , the result of the register allocation transformation and the language of FRM, the virtual register machine discussed in Section 18.2.
vious FIL dialects that took an arbitrary number of arguments — mprod — is replaced by a combination of the new primitive operator (mnew NT ) (which creates a mutable tuple with N [[NT ]] slots) and a sequence of mset! operations for filling the slots. For example, the FILlift expression (let ((Improd (@mprod S Iarg1 17))) Ebody )
can be expressed in FILreg as follows (where Rtemp , Rmprod , and Rarg1 are three distinct registers):
1102
Chapter 17
(let* ((Rmprod (@mnew 3)) (Rtemp (addr S )) (Rtemp (@mset! 1 Rmprod (Rtemp (@mset! 2 Rmprod (Rtemp 17) (Rtemp (@mset! 3 Rmprod ) {the translation of Ebody , Ebody
Compilation
Rtemp )) Rarg1 )) {assume Rarg1 corresponds to Iarg1 } Rtemp ))) in which Rmprod corresponds to Improd }
Because the operands of a primitive application must be register names, the integer literal 17 and the subroutine label (addr S ) must be stored in temporary registers before they can be used in applications of the primitive operator mset!.
17.12.2
A Register Allocation Algorithm
The Tortoise register allocation transformation RAprog has the following specification: Preconditions: The input to RAprog is a valid kernel FILlift program in which the only free identifiers of any abstraction are subroutine names. Postconditions: The output of RAprog is a valid kernel FILreg program in which the only free identifiers of any abstraction are subroutine names. Register allocation is largely the process of renaming fil-bound, abs-bound, and let-bound identifiers in FILlift to the register names r0, . . . , rnmax in FILreg . In Tortoise, register allocation must also ensure that the resulting program respects the other syntactic restrictions of FILreg by naming literals and subroutine names and expanding each mprod into mnew followed by a sequence of mset!s. Register allocation has been studied intensively, and it is the subject of many elegant and efficient algorithms. (The notes at the end of this chapter provide some references.) Tortoise uses a simple register allocation algorithm that is not particularly efficient but is easy to describe. The algorithm has three phases: 1. The expansion phase takes a FILlift program, ensures that all literals and subroutine names are bound to identifiers in a let before they are used, and converts instances of mprod into sequences of mnew and mset!s. The output is in a language called FILregId , a version of FILreg in which R ∈ Reg is redefined to be any nonsubroutine identifier and RS ∈ RegSeq is redefined to be any sequence of nonsubroutine identifiers.
17.12.2 A Register Allocation Algorithm
1103
Domains Proglift = as defined in Figure 17.41 ProgregId = programs in FILregId , a version of FILreg in which nonsubroutine identifiers are used in place of registers Progreg∞ = programs in FILreg∞ , a version of FILreg supporting an unbounded number of registers Progreg = as defined in Figure 17.43 Register Allocation Functions EX prog : Proglift → ProgregId ; described in Section 17.12.3 RC prog : ProgregId → Progreg∞ ; described in Section 17.12.4 SP prog : Progreg∞ → Progreg
; described in Section 17.12.5
RAprog : Proglift → Progreg RAprog [[P ]] = SP prog [[RC prog [[EX prog [[P ]]]]]] Figure 17.44 The Tortoise register allocation transformation RAprog is the composition of the expansion transformation EX prog , the register conversion transformation RC prog , and the spilling transformation SP prog .
2. The register conversion phase takes a FILregId program, renames all nonsubroutine identifiers to be register names, and ensures that all formal parameter sequences and operand sequences of procedure applications are prefixes of [r0, r1, r2, . . . ]. It introduces appropriate register moves (via the copy primitive) to satisfy this requirement. The output is in a language called FILreg∞ , a version of FILreg in which R ∈ Reg is redefined to include an unbounded number of register names of the form rn . This phase greedily reuses register names in an attempt to reduce the number of registers needed by the program, but that number may still exceed the fixed number nreg of registers provided by the register machine. 3. The spilling phase guarantees that only nreg registers are used in the final code by moving the contents of some registers to memory if necessary. Figure 17.44 shows how these three phases are composed to implement the register allocation function RAprog . In the following three sections, we sketch each of these phases by providing an English description of how they work along with some examples. The formal details of each phase are fleshed out in the Web Supplement.
1104
17.12.3
Chapter 17
Compilation
The Expansion Phase
The expansion phase of the Tortoise register allocator converts FILlift programs to FILregId programs by performing two transformations: 1. It introduces let-bound names for all literals and subroutine names that appear in if tests and in the operands of procedure and primitive applications. 2. It expands each primitive application of mprod into a primitive application of mnew to allocate the mutable tuple followed by a sequence of primitive applications of mset! to fill the slots of the new tuple. Figure 17.45 illustrates the expansion phase on the body of the revmap program after the Lifting stage. Both mprods in the input are expanded to the mnew/mset! idiom, and new lets are introduced to name the literals and subroutine names in the input.
17.12.4
The Register Conversion Phase
The register conversion phase of the Tortoise register allocator converts FILregId programs to FILreg∞ programs by performing three transformations: 1. It converts every formal parameter sequence I ni=0 of the program or its abstractions to an ordered register sequence r ni=0 . 2. It renames every let-bound name to a register name. 3. It guarantees that the operand sequence I ni=0 of every app expression is an ordered register sequence r ni=0 . We will illustrate each of these transformations in the context of registerconverting the following abstraction: AB 0 = (abs (clo.7 x.8 k.9) (let* ((t.10 (@mget 2 clo.7)) (t.11 (@mget 3 clo.7)) (t.12 (@* x.8 x.8)) (t.13 (@* t.11 t.12)) (t.14 (@+ x.8 t.12)) (code.15 (@mget 1 t.10))) (app code.15 t.10 t.14 t.13 t.11 k.9)))
The first transformation renames the formal parameters clo.7, x.8, and k.9 to r0, r1, and r2, respectively:
17.12.4 The Register Conversion Phase
1105
Body of Lifted revmap Before Expansion Phase (let* ((abs.10 (@mprod subr0)) (abs.11 (@mprod subr3 b.1)) (t.14 (@* a.0 7)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13)) (code.62 (@mget 1 abs.10))) (app code.62 abs.10 abs.11 t.12 ktop.9)) Body of revmap After Expansion Phase (let* ((abs.10 (@mnew 1)) (t.79 (addr subr0)) (t.78 (@mset! 1 abs.10 t.79)) (abs.11 (@mnew 2)) (t.82 (addr subr3)) (t.80 (@mset! 1 abs.11 t.82)) (t.81 (@mset! 2 abs.11 b.1)) (t.83 7) (t.14 (@* a.0 t.83)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13)) (code.62 (@mget 1 abs.10))) (app code.62 abs.10 abs.11 t.12 ktop.9)) Figure 17.45 program.
Illustration of the expansion phase on the body of the lifted revmap
AB 1 = (abs (r0 r1 r2) (let* ((t.10 (@mget 2 r0)) (t.11 (@mget 3 r0)) (t.12 (@* r1 r1)) (t.13 (@* t.11 t.12)) (t.14 (@+ r1 t.12)) (code.15 (@mget 1 t.10))) (app code.15 t.10 t.14 t.13 t.11 r2)))
We assume that there are enough registers to handle the longest formal parameter sequence. The later spilling phase will handle the case where this assumption is false. The second transformation renames each identifier I declared in a let expression to a register name R that does not appear free in the body of the let
1106
Chapter 17
Compilation
expression. Although it would be safe to use any nonfree register name, the algorithm chooses the “least” one according to the order ri ≤ rj if and only if i ≤ j. This greedy strategy attempts to reduce register usage by reusing low-numbered registers whose values are no longer needed. For example, renaming let-bound identifiers transforms AB 1 to AB 2 = (abs (r0 r1 r2) (let* ((r3 (@mget 2 r0)) {r0,r1,r2 used later, so use r3 for t.10} (r0 (@mget 3 r0)) {r0=clo.7 not used later, so reuse r0 for t.11} (r4 (@* r1 r1)) {r0–r3 used later, so use r4 for t.12} (r5 (@* r0 r4)) {r0–r4 used later, so use r5 for t.13} (r1 (@+ r1 r4)) {r1=x.8 not used later, so reuse r1 for t.14} (r4 (@mget 1 r3))) {r4=t.12 not used later, so reuse r4 for code.15} (app r4 r3 r1 r5 r0 r2)))
Note how r0, r1 and r4 are reused when they are no longer mentioned in the rest of the computation. After the first two transformations are performed, the program satisfies the grammar of FILreg∞ except for app expressions (app Irator R n−1 i=0 ). Although the first two transformations guarantee that all app operands are registers, they are not necessarily the sequence r n−1 i=0 required by the FILreg∞ grammar. This form can be achieved by a register shuffling process that uses a sequence of copy applications to move the contents of the registers in the source operand sequence n−1 R n−1 i=0 to the corresponding registers in the destination operand sequence r i=0 . For example, (app subr5 r2 r0) can be transformed to (let* ((r1 (@copy r0)) (r0 (@copy r2))) (app subr5 r0 r1))
A simple but very inefficient implementation of shuffling would copy the first n−1 {r } ∪ ∪ n operands to n fresh registers not mentioned in ∪n−1 i i=0 i=0 {Ri } ∪ {Irator }, and then copy the operands from the fresh registers to r n−1 i=0 . This is expensive both in the number of additional registers used (n) and the number of copy operations performed (2n). Using more registers also increases the need for spilling. We now sketch a register shuffling algorithm that uses at most two registers in addition to the ones already mentioned in the source and destination register sets. (See the Web Supplement for full details.) The register shuffling algorithm begins by testing whether the operator of the app is a register in the difference of the destination and source register sets. If so, it must be renamed to a name not
17.12.4 The Register Conversion Phase
1107
in the union of these sets to avoid blocking later copy operations. This is the first additional register that the algorithm may use. For example, in the application (app r4 r3 r1 r5 r0 r2) of AB 2 , the operator r4 is a destination register not mentioned in the source registers, and so is renamed to the least register not appearing in either set (r6): (let ((r6 (@copy r4))) (app r6 r3 r1 r5 r0 r2))
The rest of the register shuffling algorithm transforms a FILregId application Eapp = (app Irator R n−1 i=0 ) to a FILreg∞ expression of the form (let* ((Rdst (@copy Rsrc ))kj=1 ) j j n−1 (app Irator r i=0 ))
that has the same meaning as Eapp . This transformation is guided by a register dependence graph (RDG) that keeps track of the copy operations that still need to be performed. An RDG is a set of edges, where each edge is a pair of register names, written Rdst Rsrc , that associates a destination register Rdst with a source register Rsrc . Such an edge indicates that the value in Rsrc must be moved into Rdst , and so corresponds to the let binding (Rdst (@copy Rsrc )). The direction of the arrow indicates that the final content of the Rdst depends on the current content of Rsrc . For the application (app r6 r3 r1 r5 r0 r2), this graph can be depicted as r4 r2 r5 r0 r3
There is no edge involving r1 because it is already in the correct position. There are two connected components in this graph: the acyclic component involving r4, r2, and r5, and the cyclic component involving r0 and r3. The copy associated with the edge rdst rsrc can be performed only if the destination register rdst will not be the source of a later copy operation — i.e., only if there is no edge of the form R rdst in the RDG. Another way of phrasing this condition is that the number of directed edges going into vertex rdst (its in-degree) must be 0. We will call an edge rdst rsrc a root edge of an RDG if the in-degree of rdst is 0. A root edge appears as an initial edge of an acyclic component of the RDG. The fundamental strategy of the register shuffling algorithm is to find a root edge EG root = rdst rsrc in the RDG (there may be more than one) and perform its corresponding copy operation via the let binding (rdst (@copy rsrc )). The shuffling process then continues after removing EG root from the RDG be-
1108
Chapter 17
Compilation
cause rdst now contains its final value. For example, processing the first two root edges in the RDG r4 r2 r5 yields
r0 r3 for (app r6 r3 r1 r5 r0 r2)
(let* ((r4 (@copy r2)) {move r2 to r4 in (app r6 r3 r1 r5 r0 r2)} (r2 (@copy r5))) {move r5 to r2 in (app r6 r3 r1 r5 r0 r4)} (app r6 r3 r1 r2 r0 r4))
When processing root edge rdst rsrc , if the operator is named rsrc , it is necessary to rename it to rdst . This does not happen in our example. The RDG for the residual application (app r6 r3 r1 r2 r0 r4) is the cyclic graph r0 r3, which contains no root edge. To handle this situation, a temporary register Rtemp is used to break one of the cycles, converting it to an acyclic component. An arbitrary edge EG arb = rdst rsrc is chosen from one of the cyclic components, and the content of rsrc is stored in Rtemp by the let binding (Rtemp (@copy rsrc )). Replacing EG arb in the RDG by rdst Rtemp yields an acyclic component rsrc . . . rdst Rtemp that allows the root-edgefinding strategy of the algorithm to proceed. The temporary register Rtemp can be the least register that is different from Irator and is not a member of the final destination registers. In the case where an RDG contains multiple cyclic components, a single temporary register can be used to break all of the components. This is the second of the two registers that may be required to perform the register shuffling. (The first additional register was used for potentially renaming the operator of an application.) In our example, suppose the edge r0 r3 is chosen to break the cycle r0 r3. Since r0 through r4 are reserved for the final operands and the operator is r6, r5 is chosen as the temporary register. The residual application (app r6 r3 r1 r2 r0 r4) is now transformed to (let* ((r5 (@copy r3))) {break cycle in (app r6 r3 r1 r2 r0 r4) with r5} (app r6 r5 r1 r2 r0 r4))
where the new RDG r3 r0 r5 consists of a single acyclic component. Processing the remaining two edges leads to two more let bindings: (let* ((r3 (@copy r0)) {move r0 to r3 in (app r6 r5 r1 r2 r0 r4)} (r0 (@copy r5))) {move r5 to r0 in (app r6 r5 r1 r2 r3 r4)} (app r6 r0 r1 r2 r3 r4))
17.12.4 The Register Conversion Phase
1109
So the final abstraction AB 3 that results from applying our register shuffling algorithm to AB 2 is: AB 3 = (abs (r0 r1 r2) (let* ((r3 (@mget 2 r0)) (r0 (@mget 3 r0)) (r4 (@* r1 r1)) (r5 (@* r0 r4)) (r1 (@+ r1 r4)) (r4 (@mget 1 r3)) (r6 (@copy r4)) (r4 (@copy r2)) (r2 (@copy r5)) (r5 (@copy r3)) (r3 (@copy r0)) (r0 (@copy r5))) (app r6 r0 r1 r2 r3 r4)))
Applying the expansion and register conversion phases to the revmap example yields the FILreg∞ program in Figure 17.46. Such a program is clearly very close to register machine language; it leaves very little to the imagination! The program uses eight registers (r0 through r7), so no spilling is required as long as the number of machine registers nreg is at least eight. Although our algorithm is simple and tends to use a small number of registers, it can use more registers and/or perform more copy operations than necessary. For example, here is a register-converted version of AB 0 that uses only five registers and one copy operation: (abs (r0 r1 r2) (let* ((r4 (@copy r2)) {moving r2 to r4 right away frees up r2.} (r3 (@mget 3 r0)) (r0 (@mget 2 r0)) {this @mget moved later so r0 free for result.} (r5 (@* r1 r1)) (r2 (@* r3 r5)) (r1 (@+ r1 r5)) (r5 (@mget 1 r0))) (app r5 r0 r1 r2 r3 r4)))
This version avoids many copy operations by (1) storing results in registers chosen according to their operand position in the app expression and (2) reordering the (@mget 2 r0) and (@mget 3 r0) bindings so that the result of (@mget 2 r0) can be stored directly in r0. Code using fewer registers or register moves (i.e., copy operations) than our algorithm can be obtained with other register allocation algorithms from the
1110
Chapter 17
(fil (r0 r1 r2) (let* ((r3 (@mnew 1)) (r4 (addr subr0)) (r4 (@mset! 1 r3 r4)) (r4 (@mnew 2)) (r5 (addr subr3)) (r5 (@mset! 1 r4 r5)) (r1 (@mset! 2 r4 r1)) (r1 7) (r1 (@* r0 r1)) (r5 (@null)) (r1 (@cons r1 r5)) (r0 (@cons r0 r1)) (r1 (@mget 1 r3)) (r5 (@copy r1)) (r1 (@copy r4)) (r4 (@copy r3)) (r3 (@copy r2)) (r2 (@copy r0)) (r0 (@copy r4))) (app r5 r0 r1 r2 r3)) (def subr0 (abs (r0 r1 r2 r3) (let* ((r0 (@null)) (r4 (@mnew 1)) (r0 (@mset! 1 r4 r0)) (r0 (@mnew 1)) (r5 #u) (r5 (@mset! 1 r0 r5)) (r5 (@mnew 4)) (r6 (addr subr1)) (r6 (@mset! 1 r5 r6)) (r4 (@mset! 2 r5 r4)) (r4 (@mset! 3 r5 r0)) (r1 (@mset! 4 r5 r1)) (r1 (@mset! 1 r0 r5)) (r0 (@mget 1 r0)) (r1 (@mget 1 r0)) (r4 (@copy r1)) (r1 (@copy r2)) (r2 (@copy r3))) (app r4 r0 r1 r2)))) {continued in right column}
Figure 17.46
Compilation
{continued from left column} (def subr1 (abs (r0 r1 r2) (let* ((r3 (@mget 2 r0)) (r4 (@mget 3 r0)) (r0 (@mget 4 r0)) (r5 (@null? r1))) (if r5 (let* ((r0 (@mget 1 r3)) (r1 (@mget 1 r2)) (r3 (@copy r1)) (r1 (@copy r0)) (r0 (@copy r2))) (app r3 r0 r1)) (let* ((r5 (@car r1)) (r6 (@mnew 5)) (r7 (addr subr2)) (r7 (@mset! 1 r6 r7)) (r3 (@mset! 2 r6 r3)) (r3 (@mset! 3 r6 r4)) (r1 (@mset! 4 r6 r1)) (r1 (@mset! 5 r6 r2)) (r1 (@mget 1 r0)) (r3 (@copy r1)) (r1 (@copy r5)) (r2 (@copy r6))) (app r3 r0 r1 r2)))))) (def subr2 (abs (r0 r1) (let* ((r2 (@mget 2 r0)) (r3 (@mget 3 r0)) (r4 (@mget 4 r0)) (r0 (@mget 5 r0)) (r5 (@mget 1 r2)) (r1 (@cons r1 r5)) (r1 (@mset! 1 r2 r1)) (r1 (@mget 1 r3)) (r2 (@cdr r4)) (r3 (@mget 1 r1)) (r4 (@copy r1)) (r1 (@copy r2)) (r2 (@copy r0)) (r0 (@copy r4))) (app r3 r0 r1 r2)))) (def subr3 (abs (r0 r1 r2) (let* ((r0 (@mget 2 r0)) (r0 (@> r1 r0)) (r1 (@mget 1 r2)) (r3 (@copy r1)) (r1 (@copy r0)) (r0 (@copy r2))) (app r3 r0 r1)))))
revmap program after expansion and register conversion.
17.12.4 The Register Conversion Phase
1111
literature. Many of these are based on a classic register-coloring algorithm that uses registers to “color” an interference graph whose vertices are abstract register names and whose edges connect vertices that cannot be the same actual register [CAC+ 81, Cha82]. These algorithms can be adapted to pass procedure arguments in registers, as required by our approach. The assumption that all n-argument procedures take their arguments in regn−1 isters r i=0 simplifies our algorithm, but is too restrictive. Algorithms for interprocedural register allocation (e.g., [BWD95]) can reduce the number of copy operations by using different argument registers for different procedures. For example, before register shuffling is performed, the top-level call to the revmap procedure is transformed to (app r1 r3 r4 r0 r2). Since this is the only call to revmap in the program, the register shuffling operations that transform the operand sequence [r3, r4, r0, r2] to [r0, r1, r2, r3] can be eliminated if the subroutine corresponding to the revmap procedure (subr0) is simply modified to expect its arguments in the unshuffled registers: (def subr0 (abs (r3 r4 r0 r2) . . . ))
Of course, in order to use specialized argument registers for a particular procedure, the compiler must have access to its definition and all its calls. Exercise 17.38 a. Write a six-operand application (app Irator R 5i=0 ) whose RDG has two cyclic components, one acyclic component, and one vertex with in-degree 2. b. Show the result of using the register shuffling algorithm described in the text on your example. Exercise 17.39 For an application with n operands, what is the number of copy operations needed by the register shuffling algorithm described in the text in the best case? In the worst case? Write a six-operand application (app Irator R 5i=0 ) that requires the worst-case number of copies. Exercise 17.40 a. Consider the following abstraction AB a : (abs (clo.0 a.1 b.2 k.3) (let* ((t.4 (@mget 2 clo.0)) (t.5 (@mget 3 clo.0)) (t.6 (@- a.1 t.4)) (t.7 (@/ b.2 t.4)) (code.8 (@mget 1 t.5))) (app code.8 t.5 t.6 t.7 k.3)))
What is the result of register-converting this abstraction using the algorithm described in the text?
1112
Chapter 17
Compilation
b. Consider the abstraction AB b obtained from AB a by changing the application expression to (app code.8 t.5 t.4 t.6 t.7 k.3) {new argument t.4 added before t.6}
What is the result of register-converting AB b ? c. Consider the abstraction AB c obtained from AB b by changing the application expression to (app code.8 t.5 t.4 t.4 t.6 t.7 k.3) {second t.4 added before t.6}
What is the result of register-converting AB c ? d. The results of part b and part c use more registers and copy operations than necessary. Show this by register-converting AB b and AB c to FILreg∞ abstractions by hand to use both the minimal number of registers and the minimal number of copy operations. You may reorder let bindings and interleave copy bindings with the existing let bindings as long as you do not change the meaning of the abstractions.
17.12.5
The Spilling Phase
A FILreg∞ register is live if its current value may be accessed from the register later in the computation. When the number of live FILreg∞ registers exceeds the number nreg of registers in the machine, some of the register values must be stored elsewhere in memory. The process of moving values that would otherwise be stored in registers to memory is called spilling. In the Tortoise compiler, we use the name spill memory for the area of memory used to store spilled values. We treat spill memory like a zero-indexed mutable array of slots manipulated via two FILreg primitive operations: • (prim (spset! NT ) R) stores the content of R in the slot at index N [[NT ]] in spill memory and returns #u. This is abbreviated (@spset! NT R). • (prim (spget NT )) returns the value stored in the slot at index N [[NT ]] in spill memory. This is abbreviated (@spget NT ). Tortoise uses a simple spilling algorithm that assumes nreg ≥ 2. Given a FILreg∞ program P , the algorithm first determines the largest register rtop used in P . If top < nreg , then P is already a FILreg program, so it is returned. But if top ≥ nreg , then all references to registers of the form ri such that i ≥ nreg must be eliminated to convert the program to FILreg . This is accomplished by dedicating the top two registers, rsp = r(nreg −2 ) and r(sp+1 ) = r(nreg −1 ) , to the spilling process and storing the content of every register rj as follows: • If j < sp, the content of rj continues to be stored in register rj .
17.12.5 The Spilling Phase
1113
• If j ≥ sp, the content of rj is stored in slot (j − sp) of spill memory. In this case we say that rj is a spilled register. We assume that spill memory is large enough to hold the values of all spilled registers. The spilling phase performs the following spill conversion transformations on the FILreg∞ program, which are illustrated in Figure 17.47. • The program formal parameter sequence, all abstraction formal parameter sequences, and all application operand sequences are truncated to contain no register larger than r(sp−1 ) . This is because we pass the first sp arguments in registers and any arguments beyond this in spill memory. We assume that the program-invoking mechanism of the operating system “knows” that any arguments beyond those mentioned in the program formal parameters must be passed in spill memory. • A let expression (let ((Rdst LE )) Ebody ) in which rdst is a spilled register is converted to (let* ((rsp LE ) (r(sp+1 ) (@spset! dst − sp rsp ))) ) Ebody where LE is the spill-converted version of LE , Ebody is the spill-converted version of Ebody , and dst − sp is a natural number literal NT such that N [[NT ]] = (dst − sp). This takes the value that would have been stored in rdst and instead (1) stores it in the dedicated register rsp and (2) uses spset! to move it from rsp to spill memory at index (dst − sp). Storing the unit value resulting from spset! in r(sp+1 ) rather than in rsp allows the value in rsp to be used later in improved versions of the spilling algorithm.
• Any reference to a spilled register rsrc that appears as a conditional test, as an operator of a procedure application, or as the first argument of a primitive application is converted to a reference to rsp in a context where rsp is let-bound to (@spget src − sp). This takes the value that would have been retrieved directly from rsrc , and instead (1) uses spget to retrieve it from spill memory at index (src − sp), (2) stores it in the dedicated register rsp , and (3) retrieves it from rsp . Similarly, any reference to a spilled register rsrc that appears as the second argument of a primitive application is converted to a reference to r(sp+1 ) in a context where r(sp+1 ) is let-bound to (@spget src − sp). A spilled register in the second argument position is stored in a different register than one in the first position to handle the case where both argument registers are spilled registers.
1114
Chapter 17
Compilation
In the spilling example in Figure 17.47, where sp = 2, the formal parameter registers r2, r3, and r4 of the abstraction are stored in spill memory and are accessed via (@spget 0), (@spget 1), and (@spget 2), respectively. The global spill conversion transformation guarantees that any invocation of this (or any other) five-parameter subroutine will use spset! to store the third, fourth, and fifth operands in spill memory locations 0, 1, and 2 before control is passed to the subroutine. The example illustrates this parameter spilling for subroutine calls in the application of the six-parameter subroutine stored in r1. The converted code uses spset! to store the value of parameters r2 and r5 in spill memory locations 0 and 3. No explicit spset!s are needed for spilling r3 and r4 to locations 1 and 2 because these values were already placed in spill memory by the caller of the converted abstraction and are not changed in its body. Our simple spilling algorithm can generate code with some obvious inefficiencies. For example, if sp = 2, it transforms (let* ((r2 (@* r4 r4)) (r3 (@< r1 r2))) (if r3 (app r1 r0) (error wrong)))
to (let* ((r2 (@spget 2)) {move content of spilled r4 into r2} (r3 (@spget 2)) {move content of spilled r4 into r3} (r2 (@* r2 r3)) {calculate spilled r4 times spilled r4} (r3 (@spset! 0 r2)) {store content of spilled r2 into memory} (r2 (@spget 0)) {move content of spilled r2 into r2} (r2 (@< r1 r2)) {calculate r1 less than spilled r2} (r3 (@spset! 1 r2)) {move content of spilled r3 into memory} (r2 (@spget 1))) {move content of spilled r3 into r2} (if r2 (app r1 r0) (error wrong)))
when the following much simpler code would work: (let* ((r2 (@spget 2)) {move content of spilled r4 into r2} (r2 (@* r2 r2)) {use r2 for both args and for result; no need to} { spill r2 to memory since only used in next binding} (r3 (@< r1 r2))){use r2 directly and store result directly in r3; no need} { to spill r3 to memory since only used in if test} (if r3 (app r1 r0) (error wrong))) {use r3 directly}
The Web Supplement explores these inefficiencies and how they can be eliminated. Some of the simplifications can be made by a peephole optimization
17.12.5 The Spilling Phase
1115
Abstraction before Spilling (abs (r0 r1 r2 r3 r4) (let* ((r5 (@< r4 r2)) (r0 (@+ r0 r4))) (if r5 (app r3 r0 r1 r2) (let ((r2 (@* r0 r0))) (app r1 r0 r1 r2 r3 r4 r5)))) Abstraction after Spilling (where sp = 2) (abs (r0 r1) {truncate formal parameters} (let* ((r2 (@spget 2)) {move content of spilled r4 into r2} (r3 (@spget 0)) {move content of spilled r2 into r3} (r2 (@< r2 r3)) {calculate spilled r4 less than spilled r2} (r3 (@spset! 3 r2)) {store content of spilled r5 into memory} (r3 (@spget 2)) {move content of spilled r4 into r3} (r0 (@+ r0 r3)) {use r3 for spilled r4} (r2 (@spget 3))) {move content of spilled r5 into r2} (if r2 {use r2 for spilled r5} (let ((r2 (@spget 1))) {move content of spilled r3 into r2} (app r2 r0 r1)) {use r2 for spilled r3 and truncate operands} (let* ((r2 (@* r0 r0)) {calculate content of spilled r2} (r3 (@spset! 0 r2))) {store content of spilled r2 into memory} (app r1 r0 r1))))) {truncate operands} Figure 17.47
A spilling example.
phase that performs local transformations on the result of the spilling phase. Other improvements require modifying the spilling algorithm itself. Any approach to spilling based purely on an index threshold is rather crude.12 It would be better to estimate the frequency of register usage and spill the less frequently used registers. 12
But index thresholds for spilling have an interesting precedent. All machines in the IBM 360 line executed uniform machine code assuming the same number of virtual registers. Since hardware registers were expensive, cheaper machines in the line used a small number of hardware registers for the low-numbered virtual registers and main memory locations for high-numbered virtual registers. These machines employed a threshold-based spilling mechanism implemented in hardware!
1116
Chapter 17
Compilation
Notes The literature on traditional compiler technology is vast. A classic text is the “Dragon book” [ASU86]. More modern treatments are provided by Cooper and Torczon [CT03] and by Appel’s textbooks [App98b, App98a, AP02]. Comprehensive coverage of advanced compilation topics, especially optimizations, can be found in Muchnick’s text [Muc97]. Inlining is a particularly important but subtle optimization — see especially [CHT91, ASG97, DC00, JM02]. Issues in functional-language compilation are considered by Peyton Jones in [Pey87]. Compiling programs via transformations on an intermediate, lambda calculusbased language was pioneered in the Scheme community through a series of compilers that started with Steele’s Rabbit [Ste78] and was followed by many others [Roz84, KKR+ 86, Cli84, KH89, FL92, CH94]. An extreme version of this idea is the nanopass compiler for Scheme, which is composed of fifty simple transformation stages [SWD04]. The idea (embodied in FILreg ) that the final intermediatelanguage program can also be interpreted directly as a register-machine program is due to Kelsey [Kel89, KH89]. He showed that realistic compiler features like register allocation, instruction selection, and stack-based allocation could be modeled in such a framework and demonstrated that the transformational technique was viable for compiling traditional languages like Pascal and Basic. The next major innovation along these lines was developing transformationoriented compilers based on explicitly typed intermediate languages (e.g., [Mor95, TMC+ 96, Pey96, PM97, Sha97, BKR98, TO98, MWCG99, FKR+ 00, CJW00, DWM+ 01]. The type information guides program analyses and transformations, supports run-time operations such as garbage collection, and is an important debugging aid in the compiler development process. In [TMC+ 96], Tarditi and others explored how to express classical optimizations within a typed intermediate language framework. In some compilers (e.g., [MWCG99]) type information is carried all the way through to a typed assembly language, where types can be used to verify certain safety properties of the code. The notion that untrusted low-level code should carry information that allows safety properties to be verified is the main idea behind proof-carrying code [NL98, AF00]. Early transformation-based compilers typically included a stage converting the program to CPS form. The view that procedure calls can be viewed as jumps that pass arguments was championed by Steele, who observed that a stack discipline in compilation is not implied by the procedure-call mechanism but rather by the evaluation of nested subexpressions [SS76, Ste77].
Notes for Chapter 17
1117
The Tortoise MCPS transformation is based on a study of CPS conversion by Danvy and Filinski [DF92]. They distinguish so-called static continuations (what we call “metacontinuations”) from dynamic continuations and used these notions to derive an efficient form of CPS conversion from the simple but inefficient definition. Appel studied the use of continuations for compiler optimizations in [App92]. In [FSDF93], Flanagan et al. argued that explicit CPS form was not necessary for such optimizations. They showed that transformations performed on CPS code could be expressed directly in a non-CPS form they called A-normal form. Although modern transformation-based compilers tend to use something like A-normal form, we adopted a CPS form in the Tortoise compiler. It is an important illustration of the theme of making implicit structures explicit, and it simplifies the compilation of complex control constructs like nonlocal exits, exceptions, and backtracking. The observation that these constructs use continuations in a nonlinear way is discussed in [Baw93]. Closure conversion is an important stage in a transformation-based compiler. Johnsson’s lambda-lifting transformation [Joh85] lifts abstractions to top level after they have been extended with initial parameters for free variables. It uses curried functions that are partially applied to these initial parameters to represent closures. This closure representation is standard in compilers for combinator reduction machines [Hug82, Pey87]. The Tortoise lifting stage also lifts closed abstractions to top level, but uses a different representation for closures: the closure-passing style invented by Appel and Jim in [AJ88]. Defunctionalization (a notion due to Reynolds [Rey72]) has been used as the basis for closure conversion in some ML compilers [TO98, CJW00]. Selective and lightweight closure conversion were studied by Steckler and Wand [WS94, SW97]. The notion of representation pollution was studied by Dimock et al. [DWM+ 01] in a compiler that chooses the representation of a closure depending on how it is used in a program. Sophisticated closure conversion systems rely on a control flow analysis to determine how procedures are used in a program. In [NNH98], Nielson, Nielson, and Hankin provide excellent coverage of control flow analysis and other program analyses. [BCT94] summarizes work on register allocation and spilling. The classic approach to register allocation and spilling involves graph-coloring algorithms [CAC+ 81, Cha82]. See [BWD95] for one approach to managing registers across procedure calls.
18 Garbage Collection Be you the mean hombre that’s a-hankerin’ for a heap of trouble, stranger? Well, be ya? — Yosemite Sam, in “Hare Trigger”
18.1
Why Garbage Collection?
Programming without some form of automatic memory management is dangerous and may lead to run-time type errors. Here is why: A programmer forced to manage memory manually may inadvertently release a block of memory for reuse, yet retain the ability to access the block by holding on to a pointer to the block (a so-called dangling pointer). When this memory block is reused by the system, the value in it will be accessible by two independent pointers, perhaps even under two independent types (the type expected by the logically invalid pointer and that expected by the new pointer). Modifying the value via one pointer will unexpectedly cause the value accessible via the other to be changed as well, leading to insidious bugs that are notoriously difficult to catch. The program is incorrect, and in some cases, type safety can be lost!1 Thus a critical run-time service in type-safe programming language implementations is the safe allocation and deallocation of memory for compound values such as tuples, arrays, lists, and oneofs. Such values are stored in units called blocks in a region of memory called the heap. As described in Chapter 17, the Tortoise complier generates FILreg code for a simple register machine that uses the primitive operator mnew to allocate a mutable product value and the primitives mget and mset! to manipulate the contents of such products. The job of this chapter is to demonstrate how to implement primitives like these. This chapter describes a safe storage management system based on a technique for automatic heap deallocation called garbage collection. The implementa1
The same problem arises in languages that do not do array bounds checking, a deficiency exploited by countless security attacks.
1120
Chapter 18 Garbage Collection
tions of almost all type-safe languages (e.g., Java, C#, Lisp, SmallTalk, ML, Haskell) use garbage collection. In a system with manual heap deallocation, where programmers must explicitly declare when heap blocks may be reused, it is possible for a sophisticated type checker that models the state of memory to guarantee that there are no dangling pointers [ZX05]. But any such type system necessarily limits expressiveness by rejecting some programs that are actually safe. In contrast, garbage collection guarantees value integrity and type safety without limiting expressiveness. Garbage collection (GC) is a process that identifies memory blocks that will not be used again and makes their storage available for reuse. A heap block is live in a program state if it will be accessed later in the program execution, and otherwise the block is dead. It is not in general possible to prove which blocks are live in a program state, and thus a garbage collector must identify and reuse only blocks that it can prove are dead. The engineering challenge is to design a garbage collector that efficiently preserves live memory blocks and a minimum of dead blocks. Garbage collection also reduces memory leaks that arise when a programmer does not deallocate dead blocks so they can be used for something else. Memory leaks can cause a program to abort with an out-of-memory error that could have been avoided if the dead blocks were reused. It is in fact common for long-running programs to crash because of slow memory leaks that exhaust available storage. Memory leaks are notoriously difficult to find and fix, especially in a large and complex program like an operating system. Garbage collectors can also exhibit memory leaks, but they are better equipped than human beings to reason about block liveness, and typically do a better job of efficiently reclaiming dead blocks. In manual deallocation systems, the programmer is caught between two dangers: Deallocating blocks too early creates dangling pointers, whereas deallocating them too late causes memory leaks. Yet it is often difficult, if not impossible, for the programmer to know when a heap-allocated data structure can no longer be accessed by the program. For example, consider a graphics application in which two-dimensional points are represented as pairs of x and y coordinates and lines are represented as pairs of points. A single point might be shared by many lines. When a line is deleted by the application, it may be safe to deallocate the pair associated with the line, but it is not safe to deallocate the pairs associated with the line’s endpoints, since these might be shared by other lines. Without explicitly tracking how many lines a point participates in, the programmer has no idea when to deallocate a point in such a system. In contrast, because a garbage collector “knows” the exact pointer wiring structure for heap blocks in memory, it can determine properties that are difficult for a programmer to keep track of,
18.1 Why Garbage Collection?
1121
such as how many references there are to a particular point. If the answer is 0, the point may be reclaimed. Manual deallocation also complicates the implementation of data structures and data abstractions. When a compound data structure becomes dead, many of its components become dead as well. The programmer must carefully free all dead components (often recursively) before freeing the storage for the compound structure itself. Manual deallocation complicates data abstractions because allocation and deallocation responsibilities must become part of the interface. Implementers of an abstraction must often provide a mechanism for deallocating abstract data structures. C++ provides this functionality for objects via a destructor function that is called whenever the storage for the object is deallocated. A destructor function typically deallocates storage for components of the object. But the problem is more complex still: Only the client of the data abstraction knows when abstract data values and many of the components used to create them are dead; but only the implementer knows the actual structure of abstract values, including their components, data-sharing properties, and invariants. Choreographing allocations and deallocations for even relatively simple and common abstractions, such as generic linked lists, can prove extremely complex and error-prone. In the mid-1990s garbage collection came into the mainstream when the implementers of the first widely adopted type-safe programming language, Java, chose to use garbage collection for their implementation of safe storage. Although garbage collection has a rich history in languages like Lisp and SmallTalk, until recently it was considered too inefficient to support in mainstream programming languages like C, Pascal, and Ada, which opted for manual storage deallocation instead. (In fact, the Ada specification allows implementations to perform garbage collection but does not require it: Programmers are effectively required to manually deallocate storage in the many implementations that do not support garbage collection.) Java’s type safety was sufficient inducement for programmers to accept a system that uses garbage collection. The remainder of this chapter explores garbage collection in the context of FRM, the FIL Register Machine that executes the FILreg code generated by the Tortoise compiler presented in the previous chapter. FRM allocates heap blocks for mutable tuples, list nodes, and symbols, and garbage collection will allow reuse of such blocks when it determines that they will never be accessed again. Section 18.2 presents the relevant details of FRM, especially how compound values are laid out in memory. Section 18.3 discusses approximations for block liveness. Section 18.4 lays out a complete design for an FRM garbage collector. Section 18.5 sketches some other approaches to garbage collection, including a conservative GC technique that can be used for languages that traditionally
1122
Chapter 18 Garbage Collection
rely on manual deallocation. Garbage collection is a dynamic approach to automatic heap deallocation; Section 18.6 briefly discusses some static approaches to automatic heap deallocation. To keep our discussions at a high level, we will explain implementation issues and algorithms using a combination of English and pictures. A complete metalanguage formalization of FRM and several heap management strategies can be found in the Web Supplement.
18.2
FRM: The FIL Register Machine
In order to explain heap management strategies for FRM, we first need to give an overview of the FRM architecture and explain how FRM values are represented.
18.2.1
The FRM Architecture
The fundamental unit of information is the binary digit, commonly called a bit. A bit can be one of two values, 0 or 1. Every FRM value is encoded as a single word, which is a fixed-size sequence of bits. A value that is too big to fit into a single word (such as a mutable tuple, nonempty list node, or symbol) is represented as a single address word (or pointer) that is the address of a block of words in the heap. Uniformly representing all values in a single-sized word datum greatly simplifies many aspects of the FRM implementation. For example, a word-sized register can hold any value, the ith component of any heap block can be stored in its ith word, and polymorphic functions require no special implementation techniques. Many practical language implementations use nonuniform value sizes (e.g., singleprecision floating point numbers requiring one word and double-precision floating point numbers requiring two words) for efficiency reasons, but they would create needless complexity here. The state of a program running on FRM has four components: 1. The current FILreg expression E being executed. As noted in Section 17.12.1, each FILreg expression can be viewed as a register machine instruction whose execution updates the state of the machine and specifies the next instruction. (See the Web Supplement for an SOS that specifies the action of each FILreg expression as an FRM instruction.) This component corresponds to the program counter, the address of the currently executing instruction, in traditional architectures. An FRM program executes until it terminates with a halt or error expression.
18.2.2 FRM Descriptors
1123
2. The subroutine memory S, where the definitions of all the program’s subroutines are stored. This is the code segment in a traditional architecture. Rather than worry about the address of the start of a subroutine in memory, we will simply refer to each subroutine by an integer index. That is, subroutine i stands for Ebody in the definition (def subrn (abs (RS ) Ebody )) in the FILreg program being executed. As observed on page 1099, FRM can ignore the register parameters in an abstraction, because the actual argument values are passed in the registers of the machine. 3. The register memory R, where the contents of the FRM registers are stored. As in FILreg , we assume that there are nreg registers. Each register holds one word. The notation R[n] stands for the word stored in register rn . 4. The heap memory H, where the contents of memory blocks are stored. We assume that the heap is a part of main memory, an array M of words indexed by addresses that are natural numbers. The notation M[naddr ] denotes the word stored at address naddr in M. Some of the main memory may be reserved for purposes other than the heap, such as a program’s subroutine and/or spill memory.2 We assume that the portion of main memory reserved for the heap uses indices in the range [0.. (nsize − 1)], where nsize is the number of words reserved for the heap.
18.2.2
FRM Descriptors
We have seen that all FRM values are single words, where some words are pointers to blocks of words in the heap. We will now explore how to represent words and blocks on a typical register machine. This allows us to discuss some of the low-level representation choices that are important in programming language implementations. A word is represented as an n-tuple of bits. We can define FRM for a machine of any word size, but for concreteness we shall assume that all words consist of 32 bits. Suppose B ranges over bits. Then we abbreviate the word tuple B1 ,B2 ,. . .,B31 ,B32 by the juxtaposition B1 B2 · · · B31 B32 of its bits. The notation B n represents n consecutive copies of B . For example, 020 101018 stands for the word that has 20 0s followed by 1010 followed by eight 1s. There are standard ways to represent natural numbers and signed integers using bits, and standard ways to perform arithmetic on these representations. For more information, consult the Web Supplement. 2
The FRM SOS in the Web Supplement shows how to handle spill memory, which we ignore here.
1124
Chapter 18 Garbage Collection
Each FRM value can be represented as a single 32-bit word, which we shall call its descriptor. A value is said to be unboxed when all information about the value fits into the descriptor, in which case its descriptor is said to be immediate. A value is said to be boxed when some of its information is stored in a heap block, in which case its descriptor contains the address of the block and is said to be nonimmediate. We assume that word addresses are specified by 30 bits. This is consistent with the word-alignment restrictions in many 32-bit architectures.3 Descriptors with Type Tags Each FRM value is encoded as a single word with an unambiguous representation. This unambiguous representation encodes both the type of the value and the value itself. Thus, we can examine the FRM value stored in a register and decode its type and value without additional information. Such explicit type information is necessary for descriptor representations in a dynamically typed language, where it is necessary to check the type of a value at run time. Such type information can also be helpful in a statically typed language, where it can be used by run-time processes (such as garbage collectors, debuggers, and value displayers) to parse memory into values. The left-hand column of Figure 18.1 shows the way we have chosen to encode type and value information in a descriptor. A descriptor is divided into a type tag — the lower-order bits that specify the type of the value — and the value representation — the remaining bits that distinguish values of a given type. In this particular representation, the lowest-order bit is 0 for immediate values and 1 for nonimmediate values. Since nonimmediate values have only 30 bits of address information, the next-to-last bit is arbitrary; we assume that it is 0. So all pointers have a 30-bit address followed by the type tag 01. For immediate values, the next-to-last bit distinguishes integers (0) from nonintegers (1). This leaves 30 bits of information to represent a signed integer. For simplicity, we will assume that FRM supports only 30-bit signed integers in this representation. It is possible to represent integers with more bits if we box them in a heap block.4 The third-to-last bit distinguishes subroutine indices (for which this bit is 0) from other values (for which this bit is 1). This leaves 29 bits available to express the subroutine index itself (as an unsigned integer). Two additional type bits are used to distinguish the remaining four types of immediate values: unit (00), null 3
In many 32-bit architectures, a 32-bit address word specifies a byte address. But data one word wide must be aligned to a word boundary, i.e., its address must have 00 as its lowermost bits. So the information content of a word address is limited to its upper 30 bits. 4 This technique can be used to represent arbitrary-sized integers, known as bignums.
18.2.2 FRM Descriptors
1125
Descriptor with type tags
Value
Descriptor with GC tags only
[30-bit signed integer]
00
integer
[31-bit signed integer]
0
[29-bit subroutine index]
010
subroutine
[31-bit subroutine index]
0
027
00110
unit
031
0
027
01110
null
031
0
026 0
10110
false
030 0
0
026 1
10110
true
030 1
0
019 [8-bit ASCII code]
11110
character
023 [8-bit ASCII code]
0
pointer
[30-bit address]
01
[30-bit address]
01
Figure 18.1 Two layouts for FRM descriptors: one with full type tags and one with garbage collection (GC) tags.
list (01), boolean (10), and character5 (11). The unit and null list types have only one value each, so the remaining 27 bits are arbitrary; we assume they are all 0. The boolean type has two values, which are distinguished by the 27th bit: 0 for false and 1 for true. In a character descriptor, the remaining 27 bits can be used to encode the particular character — e.g., as an 8-bit ASCII code or a 16-bit unicode representation. From the perspective of encoding words as bits, the placement and content of the type tags are arbitrary. For example, we could have put the type tags in the leftmost bits rather than the rightmost bits, and we could have made the integer type tag 01 and the pointer type tag 00. However, the particular choices made for type tags in Figure 18.1 have practical benefits on real architectures: • Using a rightmost tag of 00 for integers simplifies integer arithmetic. Each 30-bit signed integer i is represented by a word that denotes 4i . Addition, 5
Although FIL does not have character literals, FRM uses character values to represent symbols as boxed values whose components are characters.
1126
Chapter 18 Garbage Collection
subtraction, and remainder can be performed on these descriptors simply using the standard 32-bit arithmetic operations without using any other bit-level operations, because these operations preserve the rightmost 00 bits. Multiplication and division take slightly more work: one of the multiplication operands must be shifted right (arithmetically) by two bits to eliminate its 00 tag before performing the multiplication; and the result of division must be shifted left by two bits to add the 00 tag. Arithmetic would require more work if leftmost tags were used or a rightmost tag other than 00 were used (see Exercise 18.1). • Using a nonzero rightmost tag for pointers is efficient on most architectures via an offset addressing mode, which allows direct access to memory at a fixed offset (a small signed integer) from a 32-bit byte address stored in a register or memory location. An address with a 01 pointer tag can effectively be converted into a word-aligned byte address by specifying a −1 offset. Exercise 18.1 Assuming the following alternative placement and/or content of type tags, describe how to perform (1) integer arithmetic (+, −, ×, ÷, and %) and (2) accesses for memory addresses. a. Type tags are the rightmost two bits of a descriptor, 11 is the integer type tag, and 00 is the pointer type tag. b. Type tags are the leftmost two bits of a descriptor, 00 is the integer type tag, and 01 is the pointer type tag. c. Type tags are the leftmost two bits of a descriptor, 01 is the integer type tag, and 00 is the pointer type tag.
Descriptors with Garbage Collection (GC) Tags In the run-time system for a statically typed language, descriptors need not carry complete type information, because dynamic type checking is unnecessary. This is true for FILreg programs that are produced by the Tortoise compiler. However, it is still helpful for descriptors to carry information that aids other run-time services, like garbage collection. As we shall see later, a garbage collector needs to distinguish between pointers and nonpointers. This can be accomplished with a one-bit GC tag. The right-hand column of Figure 18.1 shows a descriptor layout that uses the low-order bit as the GC tag. A 0 indicates a nonpointer, a 1 indicates a pointer. Since pointers have only 30 bits of address information, we will use a two-bit 01 tag for pointers, reserving the 11 tag for for header words in heap blocks (see Section 18.2.3). The choice of the placement and values of the GC tags is guided by the same logic used for type tags. Using a rightmost 0 bit for immediate descriptors
18.2.3 FRM Blocks naddr
Encoding of nslots
naddr + 1 naddr + 2 .. . naddr + nslots Figure 18.2 naddr .
1127 (optional) type
11
; header word
W1
; content of slot 1
W2 .. . Wnslots
; content of slot 2
; content of slot nslots
The layout of a heap block with contents W1 , . . . , Wnslots at word address
simplifies integer arithmetic, and the 01 pointer tag can be processed at little or no cost by offset addressing. This layout yields an extra bit of integer precision. Note that because immediate descriptors do not include distinguishing type bits, many different values can have the same bit pattern. For example, the bit pattern 032 is used for the integer 0, the unit value, the null list, the boolean false, and the character whose ASCII code is 0. Tagless Descriptors Implementations can support garbage collection without using GC tags in descriptors. In these tag-free GC systems (Section 18.5.2), descriptors need not carry any type or GC tags, so all bits can be used for the value representation. For example, a 32-bit descriptor can encode a 32-bit integer or a 32-bit byte address. Tagless descriptors are essential in conservative GC systems (Section 18.5.3) for languages like C/C++ and Pascal, which cannot tolerate GC bits in their descriptors.
18.2.3
FRM Blocks
FRM blocks are allocated from the heap, an area of main memory that is indexed by 30-bit addresses. An FRM block is described by a single word FRM descriptor that includes its 30-bit address. This address naddr points to the header word of the FRM block, which is followed in memory by a variable number of slots, each of which can hold a single word (Figure 18.2). The header word at address naddr indicates the size of the block in words (excluding the header word itself) and possibly the type of the block. To aid in parsing the heap into blocks, header words have a two-bit tag of 11, which distinguishes them from immediate and nonimmediate descriptors (see Figure 18.1). This tag is not strictly necessary, but convenient. The words at addresses [(naddr + 1) .. (naddr + nslots )] are the descriptors for the contents of the nslots slots of the block.
1128
Chapter 18 Garbage Collection
Header with type and size
Header Type
Header with size only
[28-bit size]
0011
mutable tuple
[30-bit size]
11
026 10
0111
list node (size 2)
028 10
11
[28-bit size]
1011
symbol
[30-bit size]
11
[28-bit size]
1111
closure
[30-bit size]
11
Figure 18.3 Two layouts for FRM header words: one with size and type information and one with size information only.
For a statically typed language with garbage collection (the right-hand column of Figure 18.3), the type of every block is statically known, but the garbage collector needs to know the size of each block. The first 30 bits of the header are used to encode this size. For a dynamically typed language, the header may encode type information in addition to size information. The choices in the left-hand column of Figure 18.3 indicate that there are four types of FILreg values represented as blocks: mutable tuples (created by mnew), nonempty list nodes, symbols, and closures.6 The types of these four values can be distinguished by two type bits in addition to the 11 header tag. More type bits would be needed if FILreg were extended to support additional compound values, such as strings and arrays. For example, here are two heap block representations for the result of compiling and executing the FLARE/V expression (@pair #t 42): Block with type info 26
Block with GC info
0 10
0011
mutable tuple header
028 10
11
026 1
10110
true
030 1
0
42
025 101010
0
024 101010
00
In this block representation, accessing or changing slot i (1-indexed) of a block at 30-bit address naddr with nslots slots simply manipulates the location at address 6
Although closures are represented as mutable tuples in FILreg , it is helpful to distinguish the types of closure tuples from the types of other mutable tuples. In a compiler that maintains implicit type annotations for all expressions, closure types would be introduced by the closure conversion stage.
18.2.3 FRM Blocks
1129
naddr + i in main memory. In the case where the block size is not known at compile time (e.g., for an array of unknown size) it is first necessary to check at run time that 0 < i ≤ nslots , where nslots is determined from the header word at address naddr . Failure to pass this check leads to an out-of-bounds index error. But for data whose size is known statically (e.g., FIL’s mutable products), no dynamic index check is necessary. The simple heap-block layout depicted in Figure 18.2 does not make efficient use of space for heap-allocated products with a small, statically known number of components (in FILreg , list nodes and tuples/closures with a small number of components). Using a header word to encode the size (and possibly type) of these products has a high space overhead. One way to avoid the header word in these cases is to encode the size/type of the block in the pointer to the block rather than in the block itself. For example, reserving three right-hand bits of the pointer next to the 01 tag for this purpose would allow distinguishing eight size/type possibilities, one of which would indicate the standard block-with-header but the other seven of which would indicate headerless blocks. In addition to eliminating the header word for small blocks, this technique allows the type of a block in a dynamically typed system to be tested without a potentially expensive memory access. An obvious drawback of this approach is that the extra size/type bits reduce the range of the address space that can be expressed with the remaining address bits. Moreover, extra bit-diddling is required to turn these modified pointers into recognizable memory addresses. Size/type information can be encoded in the pointer without these drawbacks using the Big Bag of Pages (BIBOP) scheme. This is based on decomposing memory into pages by viewing the higher-order bits of a word address as a page address and the lower-order bits of a word address as a location within a particular page. In BIBOP, all blocks allocated on a page must have the same size/type information. In the simplest incarnation of BIBOP, each type of object is stored in its own single (large) page. In this case, the page address is the block type tag. It is also possible to decompose memory into many smaller pages and store the size/type information in a table indexed by the page address. BIBOP saves space by effectively using a single header per page rather than per block. There are other inefficiencies that can be addressed with clever block layouts. For example, the straightforward way to represent an n-character symbol or string as a block is to have a header word with size/type information followed by n character words. But using a 32-bit word to represent a single 8-bit ASCII character or 16-bit unicode character is wasteful of space. It is possible to employ packed representations in which 4 ASCII characters or 2 unicode characters are stored in a 32-bit word within a block.
1130
Chapter 18 Garbage Collection
Exercise 18.2 C. Hacker doesn’t like reserving a bit of every FRM descriptor for a GC tag because then it isn’t possible to have full 32-bit integers. Observing that every descriptor is either in the heap or in a register, he proposes an alternative way to store GC tags: • In a heap block, the header word is followed by one or more GC-tag words that precede the content words of the block and store the GC tags of these content words. For example, the ith bit (1-indexed) of the first GC-tag word is the GC tag of the ith content word, where 1 ≤ i ≤ 32; the ith bit of the second GC-tag word is the GC tag of the (32 + i)th content word; and so on. • For every 32 registers, a 32-bit GC-tag register is reserved to store the GC tags of the register contents. a. Describe the benefits and drawbacks of C. Hacker’s idea. b. If FRM were extended to include homogeneous arrays, what would be an efficient way to extend C. Hacker’s approach to store the GC tags for the array components?
18.3
A Block Is Dead if It Is Unreachable
A storage system may reuse any dead block. Recall that a heap block is live in an FRM state if it will be accessed later in the program execution and is dead if it will not be accessed later. Unfortunately, this property is uncomputable in general, since there is no way to prove whether a program will or will not use a pointer to an arbitrary memory block. Therefore, a garbage collector must approximate liveness by reusing only provably dead blocks. A sound garbage collector may classify a dead block as live, but not vice versa. The worst sound approximation is that all blocks are live, in which case GC degenerates to a simple heap manager that allocates blocks but never deallocates them, an approach viable for programs with small storage needs but not suitable for serious programming. Intuitively, a block is dead if it cannot be reached by following a chain of pointers from the data values currently accessible to the program. Since there is no way the program can access the block, it is provably dead. (This assumes that programs cannot generate new pointers themselves by, for example, performing arbitrary pointer arithmetic.) As we shall see, there are different algorithms for determining which blocks are reachable. GC algorithms are evaluated over many dimensions: the accuracy of their identification of live and dead blocks, how much time and space they require, whether they maintain locality (i.e., keep blocks that refer to each other close together in memory), whether they can be performed in a separate thread from
18.3.1 Reference Counting
1131
the executing program, and how long a pause may be needed to perform GC. Real-time computer systems often cannot tolerate long GC-induced pauses, and thus require incremental GC algorithms that perform only a small amount of work every time the garbage collector is invoked.
18.3.1
Reference Counting
One day a student came to Moon and said: “I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons.” Moon patiently told the student the following story: “One day a student came to Moon and said: ‘I understand how to make a better garbage collector...’ ” — MIT AI Koan about David Moon, attributed to Danny Hillis There are two basic techniques for approximating the liveness of a block. The first is reference counting, in which each block has associated with it a reference count that indicates the number of pointers pointing to the block. When the reference count falls to zero, the block is provably dead and can be immediately reclaimed, e.g., by inserting it into a free list of blocks used for allocation. Reference counting is conceptually simple and is easy to adapt to an incremental algorithm suitable for real-time systems. However, it suffers from numerous drawbacks. The run-time system executing a program must carefully increment a block’s reference count whenever a pointer to it is copied and decrement its reference count whenever a register or memory slot containing a pointer to it is overwritten, and the time overhead for this bookkeeping can be substantial. Storage must be set aside for the reference counts; e.g., a certain number of bits in the block header word can be reserved for this purpose. When reference counts are modeled by a fixed number of bits, the maximal count must be treated as “infinity” — incrementing or decrementing this count must yield the same count, and blocks with this count can never be reclaimed even when they are actually dead. Dead cyclic structures can never be deallocated by reference counting alone, since each element in the structure has at least one pointer to it. Like any heap manager that maintains a free list of blocks (regardless of whether it uses manual or automatic deallocation), reference-counting garbage collectors can suffer from memory fragmentation, where unallocated storage consists of many small blocks, none of which are contiguous. This happens when the heap manager reuses the storage of a deallocated block, but needs only part of it. Although a fragmented memory may contain a large amount of unallocated
1132
Chapter 18 Garbage Collection
storage, the largest block that can be allocated may be small, causing programs to abort prematurely because of out-of-memory errors. Fragmentation can be fixed by memory compaction, a process that moves all live blocks to the beginning of memory to yield a single large unallocated block. Compaction requires rewiring pointers and may change the contents of registers as well as the contents of heap blocks. We shall study a form of compaction in the context of the stop-and-copy garbage collection algorithm presented in Section 18.4. Reference counting is used in practice in the allocation and deallocation of disk blocks in Unix-like operating systems (including Linux). File deletion actually just removes a pointer, or hard link, and the operating system eventually collects all blocks with zero reference counts. Users are not permitted to make hard links to a directory in these systems, because this can create unreclaimable, cyclic structures on disk.
18.3.2
Memory Tracing
The second basic technique for approximating the liveness of a block is memory tracing, in which a block is considered live if it can be reached by a sequence of pointer-following steps from a root set of descriptors. In FRM, the root set for any machine state consists of the set of live registers (i.e., the registers that are mentioned in the current expression).7 In a given machine state, any block that is not reachable from the root set can not be accessed in a future state and thus is dead and may be safely collected as garbage. If we imagine that pointers to heap blocks are strings connecting physical boxes, then tracing-based GC may be viewed as a process in which the root-set descriptors are anchored down while a vacuum cleaner is applied to the heap. Any blocks that are not connected by some sequence of strings to the root set are untethered and will be sucked up by the vacuum cleaner. Memory tracing is a better approximation to reachability than reference counting, because it classifies cyclic structures unreachable from the root set as garbage. Memory tracing can also collect blocks that a reference counting scheme with fixed-size counts would render uncollectible with a count of “infinity.” Tracing-based GC imposes two requirements on a language implementation. In order to traverse all reachable blocks, it must be possible at run time to (1) distinguish block pointers from nonpointers (to know which descriptors to follow) 7 The root set also includes spill memory, which we are ignoring here (but see the Web Supplement for details). In language implementations with a run-time stack of procedure invocation frames, all descriptors in the stack are also in the root set. FRM does not have a run-time stack; instead, stack frames are encoded as heap-based continuation closures.
18.4 Stop-and-copy GC
1133
and (2) determine the size of a block (in order to process its components). In the FRM implementation we have discussed, the GC tag of a descriptor satisfies requirement 1, and the size information in a block header word satisfies requirement 2. But there are other ways to satisfy these requirements. For example, the discussion starting on page 1129 shows ways to encode size information in the pointer word itself, and Exercise 18.4 on page 1138 explores how a single header bit per word can be used to encode the size of a heap block without a header word. Some systems specially mark blocks containing no pointers so that GC does not have to examine the components of such a block.
18.4
Stop-and-copy GC
Memory tracing is the basis for a wide variety of GC strategies, including a relatively simple and effective one known as stop-and-copy. The essential idea is familiar to anyone who has moved from one dwelling to another: put everything you want to keep into a moving van, and anything not on the van at the end is garbage. Stop-and-copy garbage collection reclaims memory by copying all live data to a new area of memory and declaring everything left in the old memory space to be garbage. We will first sketch the stop-and-copy algorithm and then describe the details for FRM below. To distinguish new and old memory spaces, a heap memory of size nsize is divided into two equal-sized areas called semispaces: a lower semispace covering addresses in the range [0.. ((nsize ÷ 2) − 1)] and an upper semispace covering addresses in the range [(nsize ÷ 2) .. (nsize − 1)].8 At any time, one semispace is active and the other is inactive. The active semispace is used for all allocations, using the simplest possible strategy: Allocations start at the beginning (or symmetrically the end) of the active semispace, and each successive allocation request is satisfied with the memory block after the last one allocated. This continues until there is an allocation request for more memory than remains in the active semispace. The inactive semispace is like a field lying fallow; it contains no live blocks and is not used for allocation. When a request is made to allocate a block that cannot fit at the top of the active space, the program is stopped and the garbage collector is invoked. At this point, the active semispace is called from-space and the inactive semispace is called to-space. Garbage collection begins by copying the root set (the contents of the registers) to the bottom of to-space. It then enters a copy phase in which 8
For simplicity, assume that nsize is even.
1134
Chapter 18 Garbage Collection
it copies into to-space all blocks in from-space that are reachable from the root set. It must preserve the pointer relationships between blocks, so that the graph structure of the copied blocks is isomorphic to the original one. It must also update any pointers in the root set to point to the appropriate copied blocks in to-space. Once the copy phase is complete, all live blocks are in to-space, and the algorithm installs the updated root-set descriptors in the machine state (because all pointers have moved to to-space). At this point, the semispaces are flipped: to-space becomes the new active semispace and from-space becomes the new inactive semispace. An attempt is now made to retry the failed allocation request in the new active space: if it succeeds, the program continues normally; otherwise, program execution fails with an out-of-memory error.
A Stop-and-copy GC for FRM Implementing the allocation strategy above requires only a free pointer nfree , which points to the first free word in the active semispace. If the lower semispace is active first, nfree is initially 0, and the semispace is completely full when nfree = (nsize ÷ 2). If the addresses of the active semispace are in the range [nlo ..nhi ], then the free pointer partitions the active semispace into two parts: allocated blocks stored in the address range [nlo .. (nfree − 1)] and free memory available for future allocation in the address range [nfree ..nhi ]. A request to allocate an n-slot block is handled as follows: 1. Calculate nfree = nfree + n + 1 (the 1 accounts for the header word). 2. If there is enough room to allocate the block (i.e., if nfree ≤ nhi + 1):
(a) store a header word for size n in slot M[nfree ]; (b) save the value of nfree as nresult ; ; and (c) update the free pointer nfree to nfree
(d) indicate that allocation has succeeded and that nresult is the address of the newly allocated block. If there is not enough room to perform the allocation, then do a garbage collection (see below) and attempt the allocation again. If it fails a second time, then fail with an out-of-memory error. In FRM, the copy-phase algorithm is an iteration in three state variables: (1) a scan pointer nscan that keeps track of the blocks that need to be copied
18.4 Stop-and-copy GC Initial State nscan
nfree
to-space R[0 ] .. . R[nmax ] unallocated
1135 Intermediate State
Final State
to-space
to-space R [0 ] .. . R [nmax ] .. .
scanned nscan nfree
unscanned unallocated nscan = nfree
unallocated
Figure 18.4 Depictions of initial, intermediate, and final states of the copy phase iteration in the stop-and-copy garbage collection algorithm.
from from-space to to-space; (2) a free pointer nfree used to allocate storage in to-space for blocks being copied from from-space; and (3) the main memory M whose heap component is partitioned into from-space and to-space. Figure 18.4 shows initial, intermediate, and final states of the copy phase. The copy phase begins by installing the root set (the contents of all the registers)9 into the first nreg = nmax + 1 slots of to-space, setting nscan to point to the first slot of the root set, and setting nfree to point to the first slot after the root set. If to-space spans the memory addresses [nlo ..nhi ], then every step of the copyphase iteration maintains the following invariant: nlo ≤ nscan ≤ nfree ≤ nhi + 1
(18.1)
Indeed, the nscan and nfree pointers partition to-space into three regions: • The bottom region of to-space, occupying the address range [nlo .. (nscan − 1)], is the scanned region, which contains words that have been successfully processed by the copy phase, so that pointers in the scanned region point to blocks in to-space. • The middle region of to-space, occupying the address range [nscan .. (nfree − 1)], is the unscanned region, which contains words still to be processed by the copy phase. This region effectively serves as a first-in first-out queue of words to be processed by the copy phase; when a word at the bottom of this region is processed, new words may be added to the top of this region. 9 For simplicity, the algorithm includes all registers in the root set. A more precise tracingbased approximation to block liveness would be achieved by including only the live registers — i.e., those registers actually mentioned in the current expression of the FRM state. See Exercise 18.3.
1136
Chapter 18 Garbage Collection
• The top region of to-space, occupying the address range [nfree ..nhi ], is the unallocated region into which blocks from from-space will be copied. Two additional invariants hold at each step of the copy-phase iteration: Pointers in the scanned region point to blocks in to-space
(18.2)
Pointers in the unscanned region point to blocks in from-space
(18.3)
The copy-phase invariants hold in the initial state of the copy phase iteration: invariant (18.1) clearly holds; the scanned region is initially empty, so invariant (18.2) trivially holds; and the unscanned region initially contains the root set, whose pointers all point to from-space, so invariant (18.3) holds. Each step of the copy phase is described by one of the pictorial rewrite rules in Figure 18.5. Each rule processes the word at nscan , increments nscan to yield . The rules are distinguished by the type of nscan , and updates nfree to yield nfree the first element (the element at nscan ) in the queue of unscanned words. If this word is a nonpointer — i.e, it is an immediate descriptor or a header word — then the iteration simply skips it and moves on to the next word in the queue. If the descriptor is a pointer word, by invariant (18.3) it must specify an address nfrom of a from-space block. The first time the copy phase visits the block, it copies the contents of the block (including its header word) into to-space starting at address nfree and updates the free pointer accordingly; since all pointers in the block refer to from-space, invariant (18.3) is preserved for the next iteration. It also changes the descriptor at nscan to point to the new to-space address (nfree ) rather than the old from-space address (nfrom ), thus preserving invariant (18.2) for the next iteration. Finally, it replaces the header word of the original from-space block with its forwarding address, the new to-space address (nfree ) of the block, to indicate that the block has already been moved to to-space. If the copy phase encounters a from-space block with a forwarding address nfwd (distinguishable from a block header by having a tag of 01 instead of 11), it means that the block has already been copied to to-space, and it is only necessary to convert the block pointer to its forwarding address (in order to preserve invariant (18.2)). The copy phase eventually processes all block pointers that can be reached from the root set, thus performing a memory trace that approximates liveness by reachability from the root set. Because new blocks are copied to the end of the queue in the unscanned region, blocks are traversed in a breadth-first manner. The copy phase ends when the unscanned region queue becomes empty — i.e., when the scan pointer catches up to the free pointer. At this point, all objects reachable from the root set have been copied from from-space to to-space, and invariant (18.2) guarantees that all pointer descriptors in to-space now point into to-space.
18.4 Stop-and-copy GC
1137 Process a Nonpointer
to-space scanned nscan
nonpointer word unscanned
nfree
unallocated
nfrom
to-space scanned
⇒
nscan nonpointer word unscanned nscan = nscan + 1 nfree = nfree
unallocated
Process a Pointer to a Not-Yet-Copied Block from-space from-space .. .. . . nfrom nfrom [nslots ] 11 [nfree ] 01 W1 W1 .. .. . . + nslots nfrom + nslots Wnslots Wnslots .. .. . . to-space scanned nscan nfree
⇒
[nfrom ] 01 unscanned
to-space scanned nscan
nscan = nscan + 1
[nfree ] 01 unscanned
nfree
unallocated
nfree = nfree + nslots
[nslots ] 11 W1 .. . Wnslots + 1 unallocated
Process a Pointer to an Already Copied Block from-space from-space .. .. . . nfrom nfrom [nfwd ] 01 [nfwd ] 01 .. .. . . to-space scanned
Figure 18.5
⇒
to-space scanned
nscan
[nfrom ] 01 unscanned
nscan nscan = nscan + 1
[nfwd ] 01 unscanned
nfree
unallocated
nfree = nfree
unallocated
A pictorial description of the stop-and-copy garbage collection algorithm.
1138
Chapter 18 Garbage Collection
At the termination of the copy phase, the updated register contents in the first nreg slots of to-space are copied back into the registers, yielding the new register memory R . Additionally, a semispace flip is performed by making to-space the new active semispace. Subsequent allocations then take place starting at nfree in this new active semispace. The stop-and-copy algorithm has several nice properties. Unlike reference counting, it can collect cyclic garbage. Stop-and-copy GC compacts live blocks at the bottom of to-space; this avoids memory fragmentation and simplifies block allocation. The time to perform a stop-and-copy GC is proportional to the total size of reachable blocks, so if most of from-space is garbage, very little work is needed to perform a stop-and-copy GC. However, stop-and-copy has some serious drawbacks as well. Reserving half of heap memory for the inactive semispace wastes a large chunk of potential storage space. The breadth-first nature of the memory trace performed by stop-and-copy does not preserve the locality of blocks, which can seriously degrade memory performance. The block movement of the copy phase causes significantly more memory traffic than in-place approaches like reference counting and the marksweep strategy discussed below. Exercise 18.3 The stop-and-copy GC algorithm presented above has a root set that includes the contents of all registers. However, if a dead register (one that is not in the free variables of the currently executing expression) contains a pointer to a block, then this block will be treated as live by any tracing-based GC algorithm, even though it may be provably dead. Because of this, GC may not collect as much garbage as it could. Fix this problem by making a simple change to the GC algorithm that prevents it from following pointers stored in dead registers. Exercise 18.4 In a system that requires only GC information (not types), Ben Bitdiddle thinks that encoding the size of a block in a header word wastes too much space. He observes that it is possible to dispense with all block header words if an additional bit (which he calls the header bit) is reserved in every descriptor to indicate whether it is the first word of a block in the heap. So two tag bits are necessary in every descriptor: the header bit and the GC tag. Here is one possible tag encoding: 00 10 01 11
immediate descriptor that is not the first word in a heap block immediate descriptor that is the first word in a heap block nonimmediate descriptor that is not the first word in a heap block nonimmediate descriptor that is the first word in a heap block
The header bit should be 1 only for descriptors stored in the first word of a block in the heap. Any other descriptor (including those stored in registers) should have a header bit of 0.
18.4 Stop-and-copy GC
1139
a. Modify the stop-and-copy GC algorithm to work with Ben’s representation. b. What are the advantages and drawbacks of Ben’s approach? Exercise 18.5 Suppose that the first 20 words of main memory M in an executing FRM program have the contents shown below. (Assume the FRM GC-tag-only descriptor and size-only block representations presented in Sections 18.2.2 and 18.2.3 are being used. The number to the left of each slot is its address. In each slot, a bracketed decimal integer followed by tag bits stands for the binary representation of the integer concatenated with the tag bits.) 0 1 2 3 4
[5] 01 [5] 0 [2] 11 [7] 01 [2] 01
5 6 7 8 9
[1] 11 [10] 01 [2] 11 [2] 0 [2] 01
10 11 12 13 14
[3] 11 [3] 0 [5] 01 [14] 01 [2] 11
15 16 17 18 19
[2] 01 [7] 01 [2] 11 [10] 01 [5] 01
Suppose that the program uses only the first two registers (i.e., nreg = 2), where R[0 ] = [17] 01 and R[1 ] = [14] 01, and the program does not spill any registers. Finally, suppose that the currently executing FILreg expression has the form (let* ((r0 0) {Set register 0 to 0} (r0 (@mnew NT ))) {Set register 0 to the address of a new block with NT slots} Erest )
where FrIds[[Erest ]] = {r0, r1} (i.e., it refers to both registers). a. Draw a box-and-pointer diagram depicting the two registers and all the heap blocks in M. You should draw a register as a one-slot box. You should draw a heap block with n content slots as an n-slot box. A slot containing an immediate value should show that value. A slot containing an address should be the source of an arrow that points at the box representing the heap block at that address. b. Based on your diagram in part a, indicate which heap blocks are live and which are dead when the mnew primitive is executed. c. Assume that heap memory has 40 slots (so that the first 20 fill one semispace). Show the contents of heap memory after performing the stop-and-copy GC algorithm initiated when the mnew primitive is executed. What is the largest value of NT for which the program will not encounter an out-of-memory error? Exercise 18.6 Ben Bitdiddle has been hired by the Analog Equipment Corporation to consult on a memory management problem. Analog uses Balsa, a programming language in which heap storage is explicitly managed by programmers using the following two expression constructs:
1140
Chapter 18 Garbage Collection
(malloc E ): If the value of E is a positive integer n, returns a pointer to a block of storage that is n + 1 words long. The first word of the returned block is a size header; the other n words are uninitialized. An out-of-memory error is generated if there is insufficient storage to allocate a block of the requested size. An error is generated if the value of E is not a positive integer. (free E ): If the value of E is a pointer to a block of storage, deallocates the storage of that block (allowing it to be reused by malloc) and returns unit. Otherwise, an error is generated. Analog is having problems with a very large Balsa application (dubbed The Titanic by the development staff) that eventually either mysteriously crashes or runs out of heap space. Ben suspects that the programmers who wrote the application are not properly deallocating storage. In order to debug Analog’s problem, Ben decides to implement a standard stop-andcopy garbage collector for Balsa. He modifies malloc and free to keep track of the total amount of “busy” storage — malloc increments a global *busy* counter with the number of words in the block it creates and free decrements the *busy* counter by the number of words in the block it frees. In Ben’s system, free just decrements the *busy* counter and does not actually free any storage. Instead, when storage is exhausted, the garbage collector runs and copies live storage from the old active semispace into the new one. a. Let live be the number of words copied during a garbage collection and busy be the value of the *busy* counter at the time of the garbage collection. In each of the following situations encountered while executing a Balsa program in Ben’s system with garbage collection, describe the implications for executing the same program in the original system without garbage collection: i.
live < busy
ii.
live > busy
iii.
live = busy
b. How can Ben modify his garbage collector to detect dangling pointers? c. Ben tests his garbage collector on another very large AEC program called The Brittanic.10 The program uses malloc and free for explicit memory management and works fine with one megabyte of available memory. Ben installs enough extra memory to support two semispaces, each of which has one megabyte of storage beyond the space needed for the garbage collector itself. Ben turns on the garbage collector, and, full of hope, runs the program. To his surprise, The Brittanic encounters an out-of-memory error. i.
How can you explain this behavior?
ii.
How can Ben fix the problem?
10
The Brittanic (1914–1916) was an identical sister ship of the Titanic.
18.5 Garbage Collection Variants
18.5
1141
Garbage Collection Variants
Stop-and-copy is just one of many approaches to garbage collection. Here we review some other approaches.
18.5.1
Mark-sweep GC
Another popular tracing-based GC algorithm is mark-sweep, an approach to GC that takes place in two phases. First, the mark phase traverses memory from the root set, marking each reachable block along the way (e.g., by setting a mark bit associated with each block it visits). Then the sweep phase linearly scans through memory and collects all unmarked blocks into a free list. The mark-sweep collector is invoked whenever an allocation request is made and the free list does not have a big enough block. Mark-sweep has several benefits compared to stop-and-copy. Unlike stop-andcopy, which uses only half of heap memory for allocating blocks, mark-sweep can allocate blocks in all of heap memory. Like reference counting, mark-sweep is an in-place algorithm that does not move blocks, and so it can be used in situations (such as conservative GC, discussed later) where blocks cannot be moved. Inplaceness also helps to preserve block locality and reduce memory traffic during GC. But in-placeness has a big downside as well — it implies using a free list for allocation, which leads to memory fragmentation. There are other drawbacks to mark-sweep. There is space overhead for the mark bits and time overhead for manipulating them. There is also space overhead for controlling the mark-phase traversal, although this can be eliminated using the clever pointer-reversal technique described in [SW67]. Finally, the sweep phase takes time proportional to the size of the heap rather than to the size of live memory. In contrast, stop-and-copy GC takes time proportional to the size of live memory.
18.5.2
Tag-free GC
A GC algorithm is said to be tag-free if GC descriptors do not require tags. For a statically typed language, it is possible to implement a tag-free GC that can also eliminate header words for blocks whose sizes are statically known. The basic idea is to design the implementation so that the garbage collector is provided with (or can find) run-time type information for every word in the root set. This type information can be used as a “map” that guides GC by distinguishing pointers from nonpointers and indicating the sizes of blocks whose size is statically known.
1142
Chapter 18 Garbage Collection
(Size-bearing header words are still necessary for blocks whose size is known only dynamically, such as arrays.) For example, a descriptor with type (prodof int bool (listof int)) is a pointer to a block with three slots, the first two of which are nonpointers, but the third of which (if nonnull) is a pointer to a two-slot block with one nonpointer and one pointer. Because it has compact descriptions for complex specifications (e.g., (listof int) describes the layout of integer lists of any length), such a type “map” generally requires far less storage than that needed to explicitly annotate every word and block with GC information. But tag-free GC complicates the compiler (which must supply type information to the run-time system), the runtime system (which must preserve the type information), and the GC algorithm (which must find and follow the type “map”).
18.5.3
Conservative GC
In a tracing-based GC, it is never sound to treat a pointer as a nonpointer, since this could result in classifying a live block as dead. However, it is sound to treat some nonpointers as pointers. For example, in a system where words do not carry GC tags, if an integer in a register happens to have the same bit pattern as a pointer to a heap block, it’s OK to consider that block reachable; the only downside is that this block may now cause a memory leak. But if the probability of misinterpreting an integer as a block pointer is low, then this sort of memory leak may not be any worse than leaks due to other liveness approximations. This is the key idea behind a tag-free approach known as conservative GC [BW88], which can be used for garbage collection in implementations of languages (e.g., C/C++ and Pascal) that cannot tolerate GC tags in their word representations. There are clever techniques for efficiently determining whether an arbitrary word is a possible heap-block address and, if so, for determining the size of the block. Conservative GC must use an in-place algorithm (like mark-sweep) to collect dead blocks because there is no reliable way to distinguish integers from pointers when performing the pointer rewiring required by copying techniques. Empirically, conservative GC appears to work well in many situations, and it is the only GC technique available for many languages.
18.5.4
Other Variations
There are many other variations on the garbage collection approaches that we have discussed. Based on the observation that most blocks in some systems are short-lived, so-called generational collectors partition the heap into regions based on block lifetimes; recently allocated blocks are put in regions where
18.5.4 Other Variations
1143
collection is performed frequently, and older blocks migrate to regions where collection is performed less frequently. There are numerous incremental versions of many approaches that reduce the length of GC pauses or bound them so that GC can be used in real-time systems. There are also many concurrent GC algorithms that can run in a thread separate from the one(s) used for program execution; the challenging problem these algorithms must address is how to handle the fact that some threads are changing the graph of blocks while GC is being performed in a separate thread. Even more challenging is performing garbage collection in distributed environments, where memory is distributed over processing nodes connected by a communication network with an arbitrary topology. In practice, choosing a garbage collector depends critically on numerous details, such as the typical lifetimes of blocks of particular sizes, the tolerance for memory leaks, the frequency of cyclic data, the acceptability of GC pauses, the necessity of keeping blocks in place, the importance of memory locality, the cost of memory traffic, and various other issues involving time and space resources. Finding a good heap manager can require implementing several solutions, comparing them empirically, and fine-tuning the one that performs best in typical situations. Sometimes it is a good idea to combine several of the strategies we have discussed. For example, a system might require certain blocks to be deallocated manually and use reference counts to automatically deallocate other blocks, relying on a periodic stop-and-copy GC to compact and collect cyclic garbage from the reference-counted storage. Exercise 18.7 This exercise continues the scenario started in Exercise 18.5 on page 1139. a. Assume that heap memory has just 20 slots, containing the values shown in Exercise 18.5. Show the contents of heap memory after performing the mark-sweep GC algorithm initiated when the mnew primitive is executed. Assume that one bit of each header word is reserved for the mark bit and that reclaimed blocks are stored in a free list that is used for allocation. Assume that the free list is stored in a special register Rfree initially containing 0 (denoting the empty list) and that a block of size nslots is added to the free list by first setting the first content slot of the block to contain the content of Rfree and then setting Rfree to the address of the block. What is the largest value of NT for which the program will not encounter an out-of-memory error? b. Again assume that heap memory has just 20 slots and that memory is allocated from a free list as described in part a (where Rfree is initially the empty list). Assume that a reference-counting garbage collector is used, where 3 bits of each header word are reserved for a reference count. Show the contents of heap memory and Rfree (1) after performing the instruction that sets R0 to 0 and (2) after performing the mnew primitive. What is the largest value of NT for which the program will not encounter an out-of-memory error?
1144
Chapter 18 Garbage Collection
Exercise 18.8 Consider the following FLARE/V program P : (flare (n) (recur loop ((p (pair 0 0))) (let ((s (snd p))) (if (= s n) (fst p) (loop (pair s (+ s 1)))))))
a. Explain what this program does. b. Suppose that P is compiled by Tortoise and executed in a version of FRM using a simple heap manager that allocates blocks but never deallocates them. On the ith iteration of the loop, how many pair blocks are live and how many are dead? c. Suppose that we extend FLARE/V and FIL with a manual deallocation construct (free E ). If E denotes a compound value, then free indicates that the heap block representing this value may be reclaimed; otherwise free generates an error. Modify P to a program P that uses free in such a way that it will not abort with an out-ofmemory error for any natural number input when executed using a heap of reasonable size. d. Remark on the suitability of each of the following approaches to garbage collection for executing P : (1) stop-and-copy; (2) mark-sweep; and (3) reference counting. e. Suppose that P is translated to C, where pair is replaced by a call to malloc, C’s memory allocator, and loop is replaced by a while loop. What, if any, garbage collection techniques can prevent the program from running out of memory?
18.6
Static Approaches to Automatic Deallocation
We have studied dynamic approaches to automatic deallocation, but there are static approaches as well. In a language implementation with a run-time stack of frames that store information (e.g., arguments, local variables, return addresses) associated with procedure invocations, popping a frame on procedure exit can reclaim a large chunk of storage with low cost (resetting the stack pointer). Languages like C/C++, Pascal, and Ada permit (indeed, encourage) the programmer to allocate data blocks on the stack by declaring compound values that are local to a procedure; these are implicitly deallocated when the procedure returns. Pascal and Ada do not allow creating pointers to stack-allocated data that can outlive the stack frame for the procedure invocation in which they were allocated, so stack deallocation cannot give rise to dangling pointers in these languages. In contrast, C and C++ do allow pointers to stack-allocated data to outlive the stack frame
Notes for Chapter 18
1145
in which they were allocated, providing yet another way to generate dangling pointers in these languages. An alternative approach is to rely on a system that statically determines (e.g., using the lifetime analysis in Section 16.3.5) which blocks can safely be allocated on the stack. For example, in the Tortoise compiler, such a system would be able to determine automatically that all closures for continuation procedures introduced by the CPS stage can be stack-allocated — an expected result, since they correspond to traditional stack frames.11 The region-based approach to memory management sketched in Section 16.3.5 generalizes this idea by statically allocating each program value within a particular region of a stack of abstract regions associated with an automatically placed let-region construct. This allows an entire region to be deallocated when the let-region that introduced it is exited.
Notes A wealth of information about modern garbage collection techniques can be found in the surveys [Wil92] and [Jon99]. Earlier work is surveyed in [Coh81]. Mark-sweep collection was invented by McCarthy in the first version of Lisp [McC60]. The idea of copying garbage collection originated with Minsky [Min63], who wrote live data to disk and read it back into main memory. Fenichel and Yochelson developed a two-semispace algorithm for list memory in which live list nodes were scanned recursively [FY69]. The recursion implies extra storage for a recursion stack, but this can be eliminated by the pointer-reversal technique described in [SW67]. The iterative scanning algorithm we describe, which uses constant control space for scanning, is due to Cheney [Che70]. There are incremental versions of this algorithm that can limit the duration of a GC pause (e.g., [HGB78, AEL88, NOPH92, NO93, NOG93]). [App89] sketches how static typing can eliminate the need for almost all tag bits in a garbage-collected language. Many of the details of tag-free GC were worked out in [Gol91]. Determining the types of tagless objects in a statically typed language with polymorphism is a problem. One solution is to dynamically reconstruct the types at run time [AFH94]. Another is to modify the compiler and 11
This is true only because FLARE/V does not include constructs that capture control points, such as label/jump or cwcc. Also, note that stack allocation of continuation closures can be performed without sophisticated analysis. For example, Rozas’s Liar compiler for Scheme [Roz84] achieved this result by performing a pre-CPS closure-conversion pass that allocated closures on the heap and a post-CPS closure conversion pass that allocated closures on the stack.
1146
Chapter 18 Garbage Collection
run-time system to explicitly pass the types onto which polymorphic functions are projected [Tol94]. Conservative GC [BW88] is a variant of tag-free GC suitable for languages whose data representations cannot contain GC tag bits. An operational framework for reasoning formally about memory management is presented in [MFH95]. Interestingly, type reconstruction in this system can be used to identify values that, though reachable, will never actually be referenced, and so can be reclaimed as garbage. So-called linear types, which track the number of times a value is used, can be used in such a framework to eagerly reclaim values after their last use [IK00]. Many techniques have been developed to reduce dangling pointer errors in languages with manual heap deallocation. One approach is to insert additional run-time checks before memory operations to guarantee memory safety [NMW02, JMG+ 02]. Another approach is to prove statically that no dangling pointers can be encountered in a running program. Although this is undecidable in general, it can be done for certain kinds of programs with a sufficiently sophisticated analysis (e.g., [DKAL03, ZX05, Zhu06]). Heap management is only one of many services provided by the run-time system for a programming language implementation. For a discussion of a more full-featured run-time system, see [App90], which provides an overview of data layout and run-time services (including garbage collection, module loading, input/output, foreign function calls, and execution profiling) for an ML implementation.
Appendix A A Metalanguage Man acts as though he were the shaper and master of language, while in fact language remains the master of man. — Martin Heidegger, “Building Dwelling Thinking,” Poetry, Language, Thought (1971) This book explores many aspects of programming languages, including their form and their meaning. But we need some language in which to carry out these discussions. A language used for describing other languages is called a metalanguage. This appendix introduces the metalanguage used in the body of the text. The most obvious choice for a metalanguage is a natural language, such as English, that we use in our everyday lives. When it comes to talking about programming languages, natural language is certainly useful for describing features, explaining concepts at a high level, expressing intuitions, and conveying the big picture. But natural language is too bulky and imprecise to adequately treat the details and subtleties that characterize programming languages. For these we require the precision and conciseness of a mathematical language. We present our metalanguage as follows. We begin by reviewing the basic mathematics upon which the metalanguage is founded. Next, we explore two concepts at the core of the metalanguage: functions and domains. We conclude with a summary of the metalanguage notation.
A.1
The Basics
The metalanguage we will use is based on set theory. Since set theory serves as the foundation for much of popular mathematics, you are probably already familiar with many of the basics described in this section. However, since some of our notation is nonstandard, we recommend that you at least skim this section in order to familiarize yourself with our conventions.
1148
A.1.1
Appendix A A Metalanguage
Sets
A set is an unordered collection of elements. Sets with a finite number of elements are written by enclosing the written representations of the elements within braces and separating them by commas. So {2, 3, 5} denotes the set of the first three primes. Order and duplication don’t matter within set notation, so {3, 5, 2} and {3, 2, 5, 5, 2, 2} also denote the set of the first three primes. A set containing one element, such as {19}, is called a singleton. The set containing no elements is called the empty set and is written {}. We will assume the existence of certain sets: Unit = {unit} ; Bool = {true, false} ; Int = {. . . , −2, −1, 0, 1, 2, . . .} ; Pos = {1, 2, 3, . . .} ; Neg = {−1, −2, −3, . . .} ; Nat = {0, 1, 2, . . .} ; Rat = {0, 1, −1, 12 , − 21 , 2, −2, 13 , − 13 , 32 , − 32 , 3, −3, 32 , − 23 , . . .} ; Char = {‘a’, ‘b’, . . . , ‘A’, ‘B’, . . . , ‘1’, ‘2’, . . . , ‘.’, ‘,’, . . .} ; String = {“”, “a”, “b”, . . . , “foo”, . . . “a string”, . . .} ;
standard singleton set truth values integers positive integers negative integers natural numbers rationals text characters all character strings
(The text in slanted font following the semicolon is just a comment and is not a part of the definition. This is one of two commenting styles used in this book. In the other commenting style, the comments are written in slanted font and are delimited by braces. However, the braces would be confusing in the presence of set notation, so we use the semicolon style in some cases.) The Unit set is the canonical singleton set; its single element is named unit. Bool is the set of the boolean truth values true and false. Int, Pos, Neg, Nat, and Rat (which contains all ratios of integers) are standard sets of numbers. String is the set of all character strings. Unit and Bool are finite sets, but the other examples are infinite. Since it is impossible to write down all elements of an infinite set, we use ellipses (“. . .”) to stand for the missing elements in standard sets where it is clear what the remaining elements are. We consider the unit value, truth values, numbers, and characters to be primitive elements that cannot be broken down into subparts. Character strings are not primitive because they can be decomposed into their component characters. Sets can contain any structure, including other sets. For example, the set {Int, Nat, {2, 3, {4, 5}, 6}} contains three elements: the set of integers, the set of natural numbers, and a set of four elements (one of which is itself a set of two numbers). Here the names Int and Nat are used as synonyms for the set structures they denote.
A.1.1 Sets
1149
Set membership is specified by the symbol ∈ (pronounced “element of” or “in”). The notation e ∈ S asserts that e is an element of the set S, while e ∈ S asserts that e is not an element of S. (In general, a slash through a symbol indicates the negation of the property denoted by that symbol.) For example, 0 0 Int Neg 2
∈
∈ ∈
∈
∈
Nat Neg {Int, Nat, {2, 3, {4, 5}, 6}} {Int, Nat, {2, 3, {4, 5}, 6}} {Int, Nat, {2, 3, {4, 5}, 6}}
In the last example, 2 is not an element of the given set even though it is an element of one of that set’s elements. A set A is a subset of a set B (written A ⊆ B) if every element of A is also an element of B. Every set is a subset of itself, and the empty set is trivially a subset of every set. E.g., {} ⊆ {1, 2, 3} ⊆ Pos ⊆ Nat ⊆ Int ⊆ Rat Nat ⊆ Nat Nat ⊆ Pos
Two sets A and B are equal (written A = B) if they contain the same elements, i.e., if every element of one is an element of the other. Note that A = B if and only if A ⊆ B and B ⊆ A. A is said to be a proper subset of B (written A ⊂ B) if A ⊆ B and A = B. Sets are often specified by describing a defining property of their elements. The set builder notation {x | Px } (pronounced “the set of all x such that Px ”) designates the set of all elements x such that the property Px is true of x. For example, Nat could be defined as {n | n ∈ Int and n ≥ 0}. The sets described by set builder notation are not always well defined. For example, {s | s ∈ s}, (the set of all sets that are not elements of themselves) is a famous nonsensical description known as Russell’s paradox. We will use [lo..hi] (pronounced “the integers between lo and hi , inclusive”) as an abbreviation for {n | n ∈ Int and lo ≤ n ≤ hi }; if lo > hi , then [lo..hi ] denotes the empty set. Some common binary operations on sets are defined below using set builder notation: A ∪ B = {x | x ∈ A or x ∈ B} ; union A ∩ B = {x | x ∈ A and x ∈ B} ; intersection A − B = {x | x ∈ A and x ∈ B} ; difference
1150
Appendix A A Metalanguage
The notions of union and intersection can be extended to (potentially infinite) collections of sets. If A is a set of sets, then A denotes the union of all of the component sets of A. That is,
A = {x | there exists an a ∈ A such that x ∈ a}
If Ai is a family of sets indexed by elements i of some given index set I, then i∈I
Ai =
{Ai | i ∈ I}
denotes the union of all the sets Ai as i ranges over I. Intersections of collections of sets are defined in a similar fashion. Two sets B and C are said to be disjoint if and only if B ∩ C = {}. A set of sets A = {Ai | i ∈ I} is said to be pairwise disjoint if and only if Ai and Aj are disjoint for any distinct i and j in I. A is said to partition (or be a partition of) a set S if and only if S = i∈I Ai and A is pairwise disjoint. The cardinality of a set A (written |A|) is the number of elements in A. The cardinality of an infinite set is said to be infinite. Thus |Int| is infinite, but |{Int, Nat, {2, 3, {4, 5}, 6}}| = 3. Still, there are distinctions between infinities. Informally, two sets are said to be in a one-to-one correspondence if it is possible to pair every element of one set with a unique and distinct element in the other set without having any elements left over. Any set that is either finite or in a one-to-one correspondence with Nat is said to be countable. For instance, the set Int is countable because every nonnegative element n in Int can be paired with 2n in Nat and every negative element n in Int can be paired with 1 − 2 · (n + 1). Clearly Unit, Bool, Pos, Neg, and Char are also countable. It can be shown that Rat and String are countable as well. Informally, all countably infinite sets “have the same size.” On the other hand, any infinite set that is not in a one-to-one correspondence with Int is said to be uncountable. Cantor’s celebrated diagonalization proof shows that the real numbers are uncountable.1 Informally, the size of the reals is a much “bigger” infinity than the size of the integers. The powerset of a set A (written P(A)) is the set of all subsets of A. For example, P({1, 2, 3}) = {{}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
The cardinality of the powerset of a finite set is given by: |P(A)| = 2|A| 1
A description of Cantor’s method can be found in many books on mathematical analysis and computability. We particularly recommend [Hof80].
A.1.2 Boolean Operators and Predicates
1151
In the above example, the powerset has size 23 = 8. The set of all subsets of the integers, P(Int), is an uncountable set.
A.1.2
Boolean Operators and Predicates
In our metalanguage, we will often employ standard operators to manipulate expressions that denote the boolean truth values, true and false. Suppose that p, q, and r are any expressions that stand for boolean truth values. Then: • ¬p, the logical negation of p, is false if p is true and is true if p is false. The notation ¬p is pronounced “not p.” Note that ¬(¬p) = p. • p ∧ q, the logical conjunction of p and q, is true only if both p and q are true; otherwise it is false. The notation p ∧ q is pronounced “p and q.” It is commutative (p ∧ q = q ∧ p) and associative ((p ∧ q) ∧ r = p ∧ (q ∧ r)). • p ∨ q, the logical disjunction of p and q, is false only if both p and q are false; otherwise it is true. The notation p ∨ q is pronounced “p or q.” It is commutative and associative. • The logical implication statements “p implies q,”2 “if p then q,” and “p only if q” are synonymous, and are true only when p is false or q is true; otherwise, they are false. So these statements are equivalent to (¬p) ∨ q. When p is false, these statements are said to be vacuously true. • The contrapositive of “p implies q” is “not q implies not p.” This is logically equivalent to “p implies q,” which we can see because (¬ (¬q)) ∨ (¬p) can be simplified to (¬p) ∨ q. • The statement “p if q” is equivalent to “if q then p” and thus to p ∨ (¬q). • The statement “p if and only if q,” usually abbreviated “p iff q,” is true only if both “p implies q” and its converse, “q implies p,” are true; otherwise it is false. It is equivalent to ((¬p) ∨ q) ∧ (p ∨ (¬q)). For our purposes, a predicate is a metalanguage expression, usually containing variables, that may denote either true or false when the variables are instantiated with values. Some examples are n ∈ Pos, A ⊆ B, and x > y. The first of these examples is a unary predicate, a predicate that mentions one variable (in this case, n). 2 “p implies q” is traditionally written as p → q or p ⇒ q. However, the arrows → and ⇒ are used for other purposes in this book. To avoid confusion, we will always express logical implication in English.
1152
Appendix A A Metalanguage
We have already seen predicates in set builder notation; the expression to the right of the | symbol is a predicate over the variables mentioned in the expression to the right. For example, the notation {x | x ∈ Int and (x ≥ 0 or x ≤ 5)}
denotes the set of integers between 0 and 5, inclusive. In this case, the predicate after the | symbol is built out of three smaller predicates, and we could rewrite it using boolean operators as (x ∈ Int) ∧ (x ≥ 0 ∨ x ≤ 5)
Suppose that S is a set and P (x) is a unary predicate over the variable x. Then the universal quantification statement ∀x∈S . P (x), pronounced “for all x in S, P (x),” is true iff P is true when x is instantiated to any member of S. If there is some element for which the predicate is false, the universal quantification statement is false. If S is empty, ∀x∈S . P (x) is true for any predicate P (x); in this case, the statement is said to be vacuously true. We use the notation ∀hi i=lo . P (i ) as an abbreviation for ∀i∈[lo..hi] . P (i ), where [lo..hi ] is the set of all integers i such that lo ≤ i ≤ hi . The existential quantification statement ∃x∈S . P (x), pronounced “there exists an x in S such that P (x),” is true iff there is at least one element xwitness in S such that P (x) is true when x is instantiated to xwitness . If there is no element for which the predicate is true, the existential quantification statement is false. The element xwitness (if it exists) is called a witness for the existential quantification because it provides the evidence that the statement is true.
A.1.3
Tuples
A tuple is an ordered collection of elements. A tuple of length n, called an n-tuple, can be envisioned as a structure with n slots arranged in a row, each of which is filled by an element. Tuples with a finite length are written by writing the slot values down in order, separated by commas, and enclosing the result in angle brackets. Thus 2, 3, 5 is a tuple of the first three primes. The number and order of elements in a tuple matter, so 2, 3, 5, 3, 2, 5, and 3, 2, 5, 5, 2, 2 denote three distinct tuples. Tuples of size 2 through 5 are called, respectively, pairs, triples, quadruples, and quintuples. The 0-tuple, , and 1-tuples also exist. The element of the ith slot of a tuple t can be obtained by projection, written t ↓ i. For example, if s is the triple 2, 3, 5, then s ↓ 1 = 2, s ↓ 2 = 3, and s ↓ 3 = 5. The notation t ↓ i is well formed only when t is an n-tuple and
A.1.4 Relations
1153
1 ≤ i ≤ n. Two tuples s and t are equal if they have the same length n and s ↓ i = t ↓ i for all 1 ≤ i ≤ n. As with sets, tuples may contain other tuples; e.g., 2, 3, 5, 7, 11, 13, 17 is a tuple of three elements: a quadruple, an integer, and a pair. Moreover, tuples may contain sets and sets may contain tuples. For instance, 2, 3, 5, Int, {{2, 3, 5}, 7, 11} is a well-formed tuple. If A and B are sets, then their Cartesian product (written A × B) is the set of all pairs whose first slot holds an element from A and whose second slot holds an element from B. This can be expressed using set builder notation as: A × B = {a, b | a ∈ A and b ∈ B}
For example: {2, 3, 5} × {7, 11} = {2, 7, 2, 11, 3, 7, 3, 11, 5, 7, 5, 11} Nat × Bool = {0, false, 1, false, 2, false, . . . , 0, true, 1, true, 2, true, . . .}
If A and B are finite, then |A × B| = |A| · |B|. The product notion extends to families of sets. If A 1 , . . . , An is a family of sets, then their product (written A1 × A2 × . . . × An or ni=1 Ai ) is the set of all n n-tuples a1 , a2 , . . ., an such that ai ∈ Ai . The notation A (= ni=1 A) stands for the n-fold product of the set A.
A.1.4
Relations
A binary relation on A is a subset of A × A.3 For example, the less-than relation, ), 1163, see also >=; Relational operator Greater-than relation (>), 1163, see also >; gt; Relational operator Greatest lower bound (glb), 176
1266 gt (PostFix command), 8, 40, see also Relational operator informal semantics, 10 PostFix DS, 137 GW (globalization wrapping transform), 1016 Gymnastics, mental, 446 halt (program termination) in FILreg , 1101 in FLICK extension, 501 Halting problem, 618, see also Undecidability halting function, 198, 1158 halting theorem, 49 Hamming numbers, 560 handle (FLIC exception-handling construct), 515–532 computation-based DS, 524–527 desugaring-based implementation, 527–529 metaCPS conversion, 1072 standard DS, 519–524 termination semantics, 515, 520 in type/effect system, 986–988 handle (SML exception construct), 514 [handle] (unsound type/effect rule), 986 Handle an exception, 514 HandlerEnv (exception handler domain), 519, 521, 525 Handler environment for exceptions, 519 handlerof (exception handler type), 986–988 Hankin, Chris, 1117 Hard link, 1132 Hash (Perl record), 353 Haskell array, as updatable sequence, 545 array indexing, 548 as block-structured language, 337 as implicitly typed language, 626 as purely functional language, 384 as purely functional lazy language, 209 as statically typed language, 623
Index currying of constructor arguments, 588–589 extensible records, 767 garbage collection, 1120 HDM type reconstruction, 836 heterogeneous tuple, 548 immutable string, 548 immutable updatable sequence, 548 implicit parameter, 379 lack of abstraction in pattern matching, 607–610 list-manipulation libraries, 239 monadic style and do notation, 396 nonstrict application, 215 nonstrict product, 551 pattern matching, 590, 605–610, 768, 829 sum (Either), 568 sum-of-products data, 579, 829 type reconstruction, 812 universal polymorphism, 627 unsafe features, 622 user-defined data-type declaration, 588–589 Hasse diagram for partial order, 174 Haugen, Marty (quoted), 889 Hawes, Bess (quoted), 1042 HDM type reconstruction, 812, 836 head (Scheme stream head), 561 head (metalanguage sequence function), 1182–1184 Header word in FRM block, 1127 Heap allocation in active semispace of stop-and-copy GC, 1133–1134 of FRM block, 1127 in PostHeap, 110 Heap block, 1119, 1122, see also FRM live or dead, 1120, 1130–1133 stack allocation of, 1144 Heap deallocation, see Garbage collection (GC) Heap memory, 438, 991, 1119, 1123 in PostHeap, 110
Index Heidegger, Martin (quoted), 1147 Heterogeneous list, 686 Heterogeneous product, 548, 677 Hewitt, Carl, 941 Hewlett Packard stack calculator, 8 Hiding names, 333, 901 Hierarchical scope, 334–347, 352 Higher-order function, 1160–1161, see also First-class procedure; Higher-order procedure binding construct defined with, 124 lambda notation and, 1166 to simulate multiple arguments (currying), 1162 Higher-order polymorphic lambda calculus, 765 Higher-order procedure, 214–215, see also First-class procedure; Higher-order function closure conversion to simulate with first-order procedure, 1075 list procedure, 239–240 perspectives on, 305 Higher-order type, 750–767 description, 750–758 kind, 758–767 Hindley, J. Roger, 835 Hindley-Damas-Milner (HDM) type reconstruction, 812, 836 Hoare, C. A. R., 941 Hole in context, 71 in scope of a variable, 245, 337 Holmes, Sherlock (quoted), 769 Homogeneous list, 686 Homogeneous product, 548, 677 Homomorphism, 115 HOOK (Humble Object-Oriented Kernel), 362–368, see also HOOPLA class, simulating with prototype object, 366 prototype-based language, 380 semantics, 370–373
1267 static scope, 366 syntax, 363 translation to FL, 370–373 hook (HOOK program keyword), 363 HOOPLA (Humble Object-Oriented Programming Language), 362, 368–370, see also HOOK extending with state, 403 namespaces, 376 sample classes, 371 hoopla (HOOPLA program keyword), 369 Horace (quoted), 163 Horizontal style for type derivation, 648–650 HTML, sum-of-products data via markups, 580 Hudak, Paul, 305 Hughes, R. J. M., 305 Hygienic macros, 331, 379 I ∈ Ident, see Ident I ∈ Inputs, see Inputs i ∈ Int, see Int IBM 360, register spilling, 1115 Ice cream, inspiration for mixin, 380 Icon generators, 507 [id-≈tc ] (type-constructor equivalence axiom), 913 idA (identity function), 1159, 1167 Idempotence, 1084 Ident (identifier domain) in FILreg (Identreg ), 1099 in FL dialects, 211 in FLEX/M, 893 in HOOK, 363 Identification (unique object handle), 941 Identifier, 244, 334, see also Variable; Variable reference binding occurrence of, 245 bound, 246 bound occurrence of, 245 in FLK, 210 free, 246, 389
1268 Identifier (continued ) free occurrence of, 245 fresh, 221, 225, 255, 331 primitive name, 307 structured, 886 variable vs., 244 Identifier pattern, 590 desugaring in match, 598 Identity element of an operator, 40 Identity function, 1159, 1167 Identity of an object, 383 state and, 383–384 Identity substitution, 782 id→mc (metaCPS function), 1060 IdSet (identifier set domain), 1019 IE (input expression meaning function), 283 IF (Church conditional), 300 IF (SOS input function), see Input function of SOS if (metalanguage conditional), 1180, 1190, 1191 if (EL/FL conditional) desugaring in HOOK, 369 elseless if sugar in FLICK, 399, 401 in EL dialects, 25 in EL DS, 129 in FILcps , 1046 in FILlift , 1095 in FILreg , 1101 in FL dialects, 211, 214 in FLICK standard DS, 477 in FLK DS, 282, 283 in FLK SOS, 259, 261 in metaCPS conversion, 1061 in simple CPS conversion, 1051 as primitive operator, 267 type rule in μFLEX, 644 ifS (conditional function), 1164 [if ] μFLARE type rule, 775 FLARE/E type/effect rule, 953, 955 μFLEX type rule, 644, 646 type/exception rule, 988 [if-F] (FLK SOS reduction), 259, 261
Index iff (if and only if), 1151 [if-pure] (FLARE syntactic purity rule), 816 [ifR ] (μFLARE type reconstruction rule), 793 [if-T] (FLK SOS reduction), 259, 261 [ifZ ] (FLARE/E type/effect reconstruction rule), 963, 964 Ill-typed expression, 641, 645, 770 match expression, 593 metalanguage expression, 1172 Image of function (img), 1160 Immediate component expression, 1013 Immediate descriptor, 1124 Immediate subexpression, 1013 Immutable data, 397 named product, 353–359, 549–550, 677–678, 821–826 positional product, 541–549, 676–677 sequence, 544–545, 677 string, 548 tuple, 541–542, 676–677 updatable sequence, 545–547, 548 impeach (election construct), 490–492 Imperative programming, 384, 397, see also Stateful language essence, 397 examples, 400–404 Imperative type variable, 837 Implementation language, 7 [implicit-let] (FIL simplification), 1033 Implicit parameter, 343, 344 in Haskell, 379 [implicit-projection] (FLEX/M type rule), 921 Implicit subtyping, 713–717 Implicit type, 625–627, 774, see also Type reconstruction Implicit type projection, see Polymorphic value projection, implicit Import of module bindings, 333, 352, 889 Import restriction, 916 in [effect-masking] rule, 973 in existential types, 853, 858 in [letregion] rule, 992
Index in nonce types, lack of, 862 in universal polymorphism, 731, 732 Impredicative type system, 735 of FLEX/SP, 804 in (communication effect), 998 in (input function of DS), 151 incInt (metalanguage incrementing function), 1156 [inclusion] (FLEX/S type rule), 702, 703, 719 Inclusion function ( → ), 1159 Inclusion on types, see Subtyping Incomparable elements in partial order, 175 Incomplete row type, 823 In-degree of graph vertex, 1107 Indexing, see Product indexing Induction, 1168, see also Structural induction Inequality relation ( = ), 1163, see also !=; Relational operator +inf, -inf (extended integer), 568 Inference of types, see Type reconstruction Inference rule for type judgment, 645 Infinite data, see also Lazy (CBL) product; Nonstrictness; Stream in CBN FL, 217, 324 coroutine producer value sequence, 459 thunk implementation in CBV FL, 325 Infinite loop, see Divergence; Nontermination Infinite set, 1150 Information, of exception, 515 Information hiding, 901 Ingalls, Daniel H. H. (quoted), 362 Inheritance, 362, 366–367, 379 hierarchy, 362 multiple, 380 subtyping vs., 723–725, 767 init (initialization effect in FLARE/E), 946, 949 Initial configuration of SOS, 47 Initialization effect, 946
1269 inj (positional sum injection), 576 Injection, into sum value, 570 Injection function, 1176, 1189 Injective function, 1160 Inj k (metalanguage oneof injection), 1176, 1189 inleft (either injection), 574 Inlining, 1017, 1053, 1056, 1088, 1116 inlining strategy for globalization, 1017–1019 Inner class in Java, 338 closure conversion and, 441, 1082 [input] (ELM BOS rule), 77 InputExp (input value domain) in μFLARE, 778 in μFLEX, 664 in FLK, 258, 259–260 Input function of SOS (IF ), 47, 50 in FLICK, 407 in language L, see L, SOS Inputs (program inputs domain), 49, see also InputExp in EL, 152 in PostFix, 53, 152 inright (either injection), 574 install-cont (control-point invoking function), 498, 500 Instance method, 362 representing in HOOK, 366 Instance of class, 362 representing in HOOK, 366 Instance variable, 362 representing in HOOK, 366 Instantiation of type by substitution, 783 Int (integer set & semantic domain), 1148 in EL DS, 118 in PostFix DS, 132 int (PostFix2 command), 41 int (extended integer conversion), 568 int (integer type), 628, 629 [int] μFLARE type axiom, 775 FLARE/E type/effect axiom, 953 μFLEX type axiom, 644 type/exception axiom, 988
1270 int? (FL integer type predicate), 213 in FLK DS, 285 intAt (function in PostFix DS) definition, 143 specification, 133 Integer, see also Int; IntLit; Nat; NatLit; Neg; Pos; PosLit message-passing HOOK object, 364 message-passing SmallTalk object, 365 numeral vs., 57 Integer arithmetic module, 358 Integer range notation ([lo..hi]), 1149 Interface, 235, 333, 839, 889, 1171, see also Abstraction barrier; API in dependent type system, 870 in existential type system, 856 in Java, 724 of module, 352 Interference between expressions, 427 code optimization and, 1027–1029 interleave (list interleaving view), 607 Interleaving of observable actions, 997 Internal variable capture, 252 Interpreter, 7 ASTs processed by, 116 denotational semantics vs. program to specify, 128 for ELM, 241–242, 243 interpreted language, 623 metacircular, for FL, 242 Interprocedural register allocation, 1111 Intersection of records, 550 of sets (∩), 1149 [int?-F] (FLK SOS reduction), 259 intlist (integer list data type) def-datatype declaration in FLARE, 833, 834 def-datatype declaration in FLEX/SP, 738, 739 IntLit (integer syntactic domain) in EL dialects, 25 in FL dialects, 211
Index [intR ] (μFLARE type reconstruction axiom), 793 [int?-T] (FLK SOS reduction), 259 int-tree (integer tree data type) declared by def-data, 609 example sum-of-products type, 750 example types, 694 type specified with trec, 688 Invariant loop-invariant expressions and code hoisting, 1029 of representation, 402, 842 Invariant subtyping, 704–706 Inverse limit construction, 202 Invocation, procedure, see app Invocation frame, see Procedure call frame Irreducible (SOS configurations), 50 Irreducible SOS configuration ( ⇒ ), 50 IS ∈ IdSet, 1019 Isomorphism, 1160 isomorphic sets, 1160 Isorecursive type equivalence, 689, 699, 761 ISWIM (Landin’s If You See What I Mean language), 305 CBN variant, 378 CBV lambda calculus and, 378 Iter (iterator in Sather), 507 Iterating procedure, 507 Iteration, 390, see also Iterator; Loop Church numeral expressing, 298 continuations and, 447–449 factorial as example of, 400–401 in FL, 226 in FLIC via for, 489 in FLIC via loop, 488–490 in FLIC via repeat, 420, 489 in PostLoop via for/repeat, 103, 141 via label and jump, 496 looping constructs, 449 simulating state with, 390–391 tail recursion and, 1064
Index using recur, 227 while sugar, 399, 401 Iterative fixed point technique, 168–173 Iterative procedure, 449 Iterator, see also Iteration in C++, 507, 512 in CLU, 506 in FLIC, 507–513 in Java, 507, 512 stream vs., 556, 612 with success and failure continuations, 513 iterator (FLIC iterator construct), 507 J (Landin’s control operator), 537 Java abstract class, 724 answer domain, 474 array, as fixed-length mutable sequence, 561 array, as homogeneous product, 548 array subscripting notation, 561 as call-by-value language, 309 as explicitly typed language, 625 as language without block structure, 337 as monomorphic language, 658 as object-oriented language, 362 as stateful language, 384 autoboxing/autounboxing, 750 call-by-value-sharing of mutable products, 563 constructor method, 366 covariant array subtyping, 709 downcast, 723 dynamic and static type checking, 624 dynamic loading, 890, 892, 928, 941 dynamic loading and side effects, 926 effect system for tracking exceptions, 1001 exception handling with throw/try. . .catch, 514, 985 explicit typing example, 626 file-value coherence, 926
1271 garbage collection, 1120, 1121 generic types, 749, 768 immutable string, 548 implicit projection for generic methods, 918 inner class and closure conversion, 441, 1082 inner class as limited form of block structure, 338 integers and booleans as nonobjects, 365, 379 interface, 724 iterator, 507, 512 lack of universal polymorphism, 749–750 libraries, 239 object, 550 overloading, 748 program arguments as string array, 217 resignaling of exceptions, 514 return/break/continue, 445, 490 simulating higher-order procedures, 1076 standard library API, 235 strict application, 215 sum-of-products data via objects, 579 this as receiver parameter, 363 throws to specify exceptions, 514, 985 type coercion, 718 type conversion, 716 type information in compiled files, 926 type vs. class, 723 universal polymorphism, lack of, 627 vector, as dynamically sized mutable sequence, 561, 562 vector, as heterogeneous product, 562 void expression as command, 472 void type, 209 void vs. value-returning method, 385 JavaScript as dynamically typed language, 623 as object-oriented language, 362 prototype-based objects, 380 Jim, Trevor, 1117
1272 Johnsson, Thomas, 1117 join (list joining view), 607 join (of control threads), 997 Judgment kind, 760 syntactic purity, see Syntactic purity judgment type, 645 type/cost, 989 type/effect, 951 type/exception, 987 Jump, see also break; continue; goto; jump in assembly language, 1056 to represent procedure call, 1043, 1046, 1064 jump, 494–506, 717 computation-based DS for, 500 control effects and, 978–983 in μFLARE, 800 in μFLEX, 658–659 in FLICK SOS, 503–504 in metaCPS conversion, 1070–1075 in standard DS, 497 as sugar for cwcc, 506 as valueless expression, 498 [jump] (SOS transition rule), 503 K (lambda calculus combinator), 296 K ∈ Kind, 759 Kahn, Gilles, 112 Kelsey, Richard, 1116 Kernel of programming language, 207, 1186 Key/lock for data abstraction, 843–847 Keyword (keyword domain), 211 in language L, see L, syntax Keyword, reserved, 210 Kind, 758–767 well-kinded description, 760 Kind (kind domain) in FLARE/E, 957 in FLEX/SPDK, 759 Kind assignment, 761
Index Kind checking, 760–763 decidability of, 761 interactions with type checking, 762 Kind environment, 760 base kind environment, 761 Kind judgment, 760 knull (alternative to null), 611 kons (alternative to cons), 611 L ∈ Lit, see Lit L ∈ LogicalOperator, 25 L (literal meaning function) in EL, 129 in FLK, 283 l ∈ Location, 412 label (control point), 494–506 computation-based DS for, 500 control effects and, 978–983 duplication of continuation, 498 in μFLARE, 800 in μFLEX, 658–659 in FLICK SOS, 503–504 in metaCPS conversion, 1070–1075 in standard DS, 497 as sugar for cwcc, 506 [label] (SOS transition rule), 503 lam (FLK abstraction), 211, 214 free and bound identifiers, 247 in CBN vs. CBV SOS, 310 in FLICK standard DS, 477 in FLICK standard DS for exceptions, 520, 521 in FLK DS, 283, 286 in FLK ValueExp domain, 258 in lambda calculus, 291 in static vs. dynamic scope, 335 scope, 245 substitution, in FLK, 254 Lambda abstraction, 1165 Lambda calculus (LC), 290–304 Church boolean, 300–301 Church conditional, 300–301 Church numeral, 298–300 Church pair, 302
Index Church tuple, 225 combinator as program, 291 denotational semantics, 296–297 FLK vs., 291–294 higher-order polymorphic, 765 history, 305 normalization, 292–295 operational semantics, 291–296 polymorphic, 764, 768 recursion (Y operator), 303–304 second-order, 764 simplification, 291 simply typed, 674, 699, 764 syntactic sugar, 291 syntax, 291 untyped, 622 Lambda cube, 887 Lambda lifting, 1094–1096, 1117 Lambda notation, 1165–1170 for curried function, 1168 Lisp vs., 1169–1170 recursion, 1168–1169 Landin, Peter, 42, 112, 162, 305, 378, 537 lastStack (PostFix function), 93 alternative definitions of, 98 Latent effect (of a procedure), 944, 947 in control flow analysis, 996 inflating with [does], 955 latent control effect example, 981 latent cost effect, 988, 989 latent exception effect, 986, 987 latent store effect examples, 952 in security analysis, 999 unifying in reconstruction, 965 Latently typed language, see Dynamically typed language Laziness, see also Nonstrictness graph reduction and, 314, 440 lazy evaluation, 434, see also Call-by-need; Lazy (CBL) product lazy language, 209 lazy list, see Stream lazy parameter passing, 434, see Call-by-need
1273 modularity of, 440, 559, 612 Lazy (CBL) product, 552–555 denotational semantics of, 555 SOS, 553–555 LC ∈ Location, 406 LC, see Lambda calculus leaf (tree constructor), 608, 609 generic type of, 832 leaf? (tree predicate), 451, 608 leaf~ (tree deconstructor) generic type of, 832 least (extended FL construct), 267 Least fixed point, 173, 190 in FLK DS of rec, 286 solution to effect constraint set, 962 Least Fixed Point Theorem, 190 Least upper bound (lub), 176 leaves (iterator example), 509 left (tree selector), 451, 608 Left-hand side (LHS) of transition, 50 length (procedure), 236, 238 length (metalanguage sequence function), 1183 Length of transition path, 50 Less-than-or-equal-to relation (≤), 1163, see also