118 26 7MB
English Pages 404 [395] Year 1999
Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen
1656
3
Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Singapore Tokyo
Siddhartha Chatterjee Jan F. Prins Larry Carter Jeanne Ferrante Zhiyuan Li David Sehr Pen-Chung Yew (Eds.)
Languages and Compilers for Parallel Computing 11th International Workshop, LCPC’98 Chapel Hill, NC, USA, August 7-9, 1998 Proceedings
13
Volume Editors Siddhartha Chatterjee, Jan F. Prins Department of Computer Science, The University of North Carolina Chapel Hill, NC 27599-3175, USA E-mail: {sc/prins}@cs.unc.edu Larry Carter, Jeanne Ferrante Department of Computer Science and Engineering University of California at San Diego 9500 Gilman Drive, La Jolla, CA 92093-0114, USA E-mail: {carter/ferrante}@cs.ucsd.edu Zhiyuan Li Department of Computer Science, Purdue University 1398 Computer Science Building, West Lafayette, IN 47907, USA E-mail: [email protected] David Sehr Intel Corporation 2200 Mission College Boulevard, RN6-18, Santa Clara, CA 95052, USA E-mail: [email protected] Pen-Chung Yew Department of Computer Science and Engineering, University of Minnesota Minneapolis, MN 55455, USA E-mail: [email protected] Cataloging-in-Publication data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Languages and compilers for parallel computing : 11th international workshop ; proceedings / LCPC ’98, Chapel Hill, NC, USA, August 7 - 9, 1998. S. Chatterjee . . . (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Singapore ; Tokyo : Springer, 1999 (Lecture notes in computer science ; Vol. 1656) ISBN 3-540-66426-2
CR Subject Classification (1998): D.1.3, D.3.4, F.1.2, B.2.1, C.2 ISSN 0302-9743 ISBN 3-540-66426-2 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. © Springer-Verlag Berlin Heidelberg 1999 Printed in Germany Typesetting: Camera-ready by author SPIN: 10704088 06/3142 – 5 4 3 2 1 0
Printed on acid-free paper
VII
Steering Committee Utpal Banerjee David Gelernter Alex Nicolau David Padua
Intel Corporation Yale University University of California at Irvine University of Ilinois at Urbana-Champaign
Program Committee Larry Carter Siddhartha Chatterjee Jeanne Ferrante Zhiyuan Li Jan Prins David Sehr Pen-Chung Yew
University of California at San Diego University of North Carolina at Chapel Hill University of California at San Diego Purdue University University of North Carolina at Chapel Hill Intel Corporation University of Minnesota
Organizing Committee Linda Houseman
University of North Carolina at Chapel Hill
External Reviewers George Almasi Ana Azevedo Brian Blount Calin Cascaval Walfredo Cirne Paolo D’Alberto Vijay Ganesh Xiaomei Ji
Asheesh Khare Jaejin Lee Yuan Lin Yunheung Paek Nick Savoiu Martin Simons Weiyu Tang
VI
Preface
LCPC’98 Steering and Program Committes for their time and energy in reviewing the submitted papers. Finally, and most importantly, we thank all the authors and participants of the workshop. It is their significant research work and their enthusiastic discussions throughout the workshop that made LCPC’98 a success. May 1999
Siddhartha Chatterjee Program Chair
Preface
The year 1998 marked the eleventh anniversary of the annual Workshop on Languages and Compilers for Parallel Computing (LCPC), an international forum for leading research groups to present their current research activities and latest results. The LCPC community is interested in a broad range of technologies, with a common goal of developing software systems that enable real applications. Among the topics of interest to the workshop are language features, communication code generation and optimization, communication libraries, distributed shared memory libraries, distributed object systems, resource management systems, integration of compiler and runtime systems, irregular and dynamic applications, performance evaluation, and debuggers. LCPC’98 was hosted by the University of North Carolina at Chapel Hill (UNC-CH) on 7 9 August 1998, at the William and Ida Friday Center on the UNC-CH campus. Fifty people from the United States, Europe, and Asia attended the workshop. The program committee of LCPC’98, with the help of external reviewers, evaluated the submitted papers. Twenty-four papers were selected for formal presentation at the workshop. Each session was followed by an open panel discussion centered on the main topic of the particular session. Many attendees have come to regard the open panels as a very effective format for exchanging views and clarifying research issues. Using feedback provided both during and after the presentations, all of the authors were given an opportunity to improve their papers before submitting the final manuscript contained in this volume. This collection documents important research activities from the past year in the design and implementation of programming languages and environments for parallel computing. The major themes of the workshop included both classical issues (Fortran, instruction scheduling, dependence analysis) as well as emerging areas (Java, memory hierarchy issues, network computing, irregular applications). These themes reflect several recent trends in computer architecture: aggressive hardware speculation, deeper memory hierarchies, multilevel parallelism, and “the network is the computer.” In this final editing of the workshop papers, we have grouped the papers into these categories. In addition to the regular paper sessions, LCPC’98 featured an invited talk by Charles Leiserson, Professor of Computer Science at the MIT Laboratory for Computer Science, entitled “Algorithmic Multithreaded Programming in Cilk”. This talk was the first exposure to the Cilk system for many of the participants and resulted in many interesting discussions. We thank Prof. Leiserson for his special contribution to LCPC’98. We are grateful to the Department of Computer Science at UNC-CH for its generous support of this workshop. We benefited especially from the efforts of Linda Houseman, who ably coordinated the logistical matters before, during, and after the workshop. Thanks also go out to our local team of volunteers: Brian Blount, Vibhor Jain, and Martin Simons. Special thanks are due to the
Table of Contents
Java From Flop to MegaFlops: Java for Technical Computing . . . . . . . . . . . . . . . . . . . . 1 J. E. Moreira, S. P. Midkiff and M. Gupta (IBM T.J. Watson Research Center) Considerations in HPJava Language Design and Implementation . . . . . . . . . . . 18 Guansong Zhang, Bryan Carpenter, Geoffrey Fox, Xinying Li and Yuhong Wen (Syracuse University) Locality A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 M. Kandemir (Northwestern University), J. Ramanujam (Louisiana State University), A. Choudhary (Northwestern University) and P. Banerjee (Northwestern University) An Integrated Framework for Compiler-Directed Cache Coherence and Data Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Hock-Beng Lim (University of Illinois) and Pen-Chung Yew (University of Minnesota) I/O Granularity Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Gagan Agrawal (University of Delaware) Network Computing Stampede: A Programming System for Emerging Scalable Interactive Multimedia Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Rishiyur S. Nikhil (Compaq), Umakishore Ramachandran (Georgia Tech), James M. Rehg (Compaq), Robert H. Halstead, Jr. (Curl Corporation), Christopher F. Joerg (Compaq) and Leonidas Kontothanassis (Compaq) Network-Aware Parallel Computing with Remos . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Bruce Lowekamp, Nancy Miller, Dean Sutherland, Thomas Gross, Peter Steenkiste and Jaspal Subhlok (Carnegie Mellon University) Object-Oriented Implementation of Data-Parallelism on Global Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Jan Borowiec (GMD FIRST) Fortran Optimized Execution of Fortran 90 Array Language on Symmetric Shared-Memory Multiprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Vivek Sarkar (IBM T.J. Watson Research Center)
X
Table of Contents
Fortran RED — A Retargetable Environment for Automatic Data Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Ulrich Kremer (Rutgers University) Automatic Parallelization of C by Means of Language Transcription . . . . . . 166 Richard L. Kennell and Rudolf Eigenmann (Purdue University) Irregular Applications Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Hwansoo Han and Chau-Wen Tseng (University of Maryland) Beyond Arrays — A Container-Centric Approach for Parallelization of Real-World Symbolic Applications . . . . . . . . . . . . . . . . . . . . . . 197 Peng Wu and David Padua (University of Illinois) SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 William Pugh and Tatiana Shpeisman (University of Maryland) HPF-2 Support for Dynamic Sparse Computations . . . . . . . . . . . . . . . . . . . . . . . . 230 R. Asenjo (University of M´ alaga), O. Plata (University of M´alaga), J. Touri˜ no (University of La Coru˜ na), R. Doallo (University of La Coru˜ na) and E.L. Zapata (University of M´ alaga) Instruction Scheduling Integrated Instruction Scheduling and Register Allocation Techniques . . . . . 247 David A. Berson (Intel Corporation), Rajiv Gupta (University of Pittsburgh) and Mary Lou Soffa (University of Pittsburgh) A Spill Code Placement Framework for Code Scheduling . . . . . . . . . . . . . . . . . . 263 Dingchao Li, Yuji Iwahori, Tatsuya Hayashi and Naohiro Ishii (Nagoya Institute of Technology) Copy Elimination for Parallelizing Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 David J. Kolson, Alexandru Nicolau and Nikil Dutt (University of California, Irvine) Potpourri Compiling for SIMD Within a Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Randall J. Fisher and Henry G. Dietz (Purdue University) Automatic Analysis of Loops to Exploit Operator Parallelism on Reconfigurable Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Narasimhan Ramasubramanian, Ram Subramanian and Santosh Pande (University of Cincinnati) Principles of Speculative Run–Time Parallelization . . . . . . . . . . . . . . . . . . . . . . . .323 Devang Patel and Lawrence Rauchwerger (Texas A&M University)
Table of Contents
XI
Dependence Analysis The Advantages of Instance-Wise Reaching Definition Analyses in Array (S)SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Jean-Fran¸cois Collard (University of Versailles) Dependency Analysis of Recursive Data Structures Using Automatic Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 D. K. Arvind and T. A. Lewis (The University of Edinburgh) The I+ Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Weng-Long Chang and Chih-Ping Chu (National Cheng Kung University) Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
From Flop to MegaFlops: Java for Technical Computing J. E. Moreira, S. P. Midkiff, and M. Gupta IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, New York 10598, USA {jmoreira,smidkiff,mgupta}@us.ibm.com
Abstract. Although there has been some experimentation with Java as a language for numerically intensive computing, there is a perception by many that the language is not suited for such work. In this paper we show how optimizing array bounds checks and null pointer checks creates loop nests on which aggressive optimizations can be used. Applying these optimizations by hand to a simple matrix-multiply test case leads to Java compliant programs whose performance is in excess of 500 Mflops on an RS/6000 SP 332MHz SMP node. We also report in this paper the effect that each optimization has on performance. Since all of these optimizations can be automated, we conclude that Java will soon be a serious contender for numerically intensive computing.
1
Introduction
The scientific programming community has recently demonstrated a great deal of interest in the use of Java for technical computing. There are many compelling reasons for such use of Java: a large supply of programmers, it is object-oriented without excessive complications (in contrast to C++), and it has support for networking and graphics. Technical computing is moving more and more towards a network-centric model of computation. In this context, it can be expected that Java will first be used where it is most natural: for visualization and networking components. Eventually, Java will spread into the core computational components of technical applications. Nevertheless, a major obstacle remains to the pervasive use of Java in technical computing: performance. Let us start by looking into the performance of a simple matrix-multiply routine in Java, as shown in Fig. 1. This routine computes C = C + A × B, where C is an m × p matrix, A is an m × n matrix, and B is an n × p matrix. We use that routine to multiply two 500 × 500 matrices (m = n = p = 500) on an RS/6000 SP 332MHz SMP node. This machine contains 4 × 332 MHz PowerPC 604e processors, each with a peak performance of 664 Mflops. We refer to this simple benchmark as MATMUL. The Java code is compiled into a native executable by the IBM High Performance Compiler for Java (HPCJ) [10], and achieves a performance of 5 Mflops on a 332 MHz PowerPC 604e processor. The equivalent Fortran code, compiled by the IBM XLF compiler, achieves 265 Mflops! A 50-fold performance S. Chatterjee (Ed.): LCPC’98, LNCS 1656, pp. 1–17, 1999. c Springer-Verlag Berlin Heidelberg 1999
2
J. E. Moreira et al. static void matmul(double[][] A, double[][] B, double[][] C, int m, int n, int p) { int i, j, k; for (i=0; i