Euro-Par 2008 Parallel Processing: 14th International Euro-Par Conference, Las Palmas de Gran Canaria, Spain, August 26-29, 2008, Proceedings (Lecture Notes in Computer Science, 5168)
3540854509, 9783540854500
This book constitutes the refereed proceedings of the 14th International Conference on Parallel Computing, Euro-Par 2008
Table of contents : Title Page Preface Organization Table of Contents Topic 1: Support Tools and Environments Clock Synchronization in Cell BE Traces Introduction Related Work Clock Synchronization Algorithm Implementation Aspects Experimental Results Conclusions References DGSim: Comparing Grid Resource Management Architectures through Trace-Based Simulation Introduction Requirements for Simulating Grid Resource Management Architectures DGSim: A Framework for Simulating Grid ResourceManagement Architectures Design Overview A Model for Inter-operated Cluster-Based Grids Grid Dynamics and Grid Evolution Grid Workload Generator Simulator Validation Experiments Using DGSim Performance Evaluation Using Real Workload Traces Performance Evaluation Using Realistic Workload Traces Related Work Conclusion and Future Work References Supporting Parameter Sweep Applications with Synthesized Grid Services Introduction Parametric Modeling Value Sets Web Service Interface Syntax Building the Parameter Space Reduction Operations Implementation Experiments Conclusion References A P2P Approach to Resource Discovery in On-Line Monitoring of Grid Workflows Introduction Related Work The Automatic Resource Discovery Scenario Performance Evaluation Case Study: Monitoring of Coordinated Traffic Management Workflow Conclusion References Transparent Mobile Middleware Integration forJava and .NET Development Environments Introduction Middleware Integration Architecture Example: Incremental Object Replication Generic Code Generation Architecture Implementation .NET Java Evaluation Related Work Conclusions and Future Work References Providing Non-stop Service for Message-Passing Based Parallel Applications with RADIC Introduction The RADIC Architecture Protectors and Observers RADIC Protection Levels The Basic Protection Level The Resilient Protection Level Experiments Related Work Conclusions and Future Work References On-Line Performance Modeling for MPI Applications Introduction On-Line Performance Modeling Modeling Individual Tasks Modeling MPI Communications Parallel Application Modeling Causal Paths Model-Based Analysis Techniques Prototype Implementation Start-Up Dynamic TAG Construction Tracking MPI Communications Experimental Evaluation Related Work Conclusions and Future Work References MPC: A Unified Parallel Runtime for Clusters of NUMA Machines Introduction Common Approaches for Programming Clusters ofNUMA Nodes MPC: MultiProcessor Communications Execution Model Specialized MxN Thread Scheduling Scheduler-Integrated Collective Communications Optimized NUMA-Aware and Thread-Aware Allocator Experimental Results Scalability Results with Domain Overloading Memory Allocation and Data Placement Results Conclusion and Future Works References Topic 2: Performance Prediction and Evaluation Directory-Based Metadata Optimizations for Small Files in PVFS Introduction Current Design of PVFS Metadata Optimizations File System Operations Evaluation Environment File Creation File Listing File Removal Summary Conclusion and Future Work References Caspian: A Tunable Performance Model for Multi-core Systems Introduction Performance Analysis Assumptions and Notations Analytical Model Model Validation Conclusions and Future Work References Performance Model for Parallel Mathematical Libraries Based on Historical Knowledgebase* Introduction Performance Analysis for PETSc Applications Performance Model The Pattern Recognition Engine The Knowledgebase The Data Mining Engine Model Assessment Related Efforts Conclusions References A Performance Model of Dense Matrix Operations on Many-Core Architectures Introduction The Abstract Many-Core Architecture Problem Formulation The Performance Model MatrixMultiplication LU and Cholesky Decompositon Discussion of the Model Conclusion References Empirical Analysis of a Large-Scale Hierarchical Storage System Introduction Hierarchical Organization of Jaguar’s Storage System Organization of Storage Devices and Lustre File Systems on Jaguar Bandwidth Characterization of Different Hierarchies Single-OST Single Rack Cross-Rack System Scalability Parallel File Open Optimizing File Distribution Case Study: Turbulent Combustion Simulation, S3D Conclusions References To Snoop or Not to Snoop: Evaluation of Fine-Grainand Coarse-Grain Snoop Filtering Techniques Introduction Overview of Evaluated Techniques Methodology Results and Analysis Region Scout Results RCA Results Directory Cache Results Comparison Conclusion References Performance Implications of Cache Affinityon Multicore Processors Introduction Multicore Uniprocessors Methodology Results Multicore Multiprocessors Related Work Conclusions References Observing Performance Dynamics Using Parallel Profile Snapshots Introduction Design Profile Snapshots in TAU Trace File Conversion Application of Profile Snapshots Related Work Conclusions and Future Work References Event Tracing and Visualization for CellBroadband Engine Systems Introduction Background and Related Work Cell Broadband Engine Vampir Tool Suite Cell Enabled Software Tracing Tools Design of the Tracing Infrastructure Overview Prototype Implementation Event Model Visualization Hybrid MPI/Cell Analysis Results Tracing Examples Trace Overhead and Performance Impact Conclusions and Future Work References Evaluating Heterogeneous Memory Model by RealisticTrace-Driven Hardware/Software Co-simulation Introduction Heterogeneous Memory Architectures Hierarchical Model Flat Model Research Methodology Physical Environment of the Simulation System Workload Machine Workload Experimental Result Target Configurations Workload Characteristics Write Reference to $\rm M_2$ Sensitivity to $\rm_M1$ Sensitivity to Associativity Conclusion and Future Work References Mapping Heterogeneous Distributed Applications on Clusters Introduction Performance of FlowVR Applications Modeling the Problem Using Constraint Programming Constraint Programming Problem Modeling Experiments Validating Mappings Generating Mappings Testing Application and Hardware Limits Optimization of the Cluster Use Conclusion and Future Work References Neural Network-Based Load Prediction for Highly Dynamic Distributed Online Games Introduction Method Neural Network-Based Load Prediction Distributed FPS Game Simulator Tuning Experiments Network Type Network Structure Transfer Function and SignalExpanding Results Related Work Conclusions References Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring Introduction State-of-the-Art and Related Work Useful Performance Statistics and Metrics Extended Performance Monitoring with PVFS Results Conclusions and Future Work References Topic 3: Scheduling and Load Balancing Dynamic Grid Scheduling Using Job RuntimeRequirements and Variable Resource Availability Introduction Scheduling Technique Collecting Job Runtime Information Collecting Resource Availability Information Scheduling Mechanism Implementation CoBRA Extending the Scheduler Testing and Results Testing Technique Test Configuration Results Future Work Conclusion References Enhancing Prediction on Non-dedicated Clusters Introduction Prediction Model for Non-dedicated Clusters Obtaining SP Obtaining SC Prediction Engine Experimentation Experimental Results References Conclusions and Future Work Co-allocation with Communication Considerations in Multi-cluster Systems Introduction Related Work Communication and Its Effect on Co-allocation Workloads Performance Evaluation Research Model Job Stream Communication Scheduling Algorithm and Placement Policy Experimental Set Up Viability of Co-allocation The Effect of System and Job Parameters The Effect of Thres on Performance The Effect of ψ on Performance Sensitivity to Thres The Effect of Load Overall Effect of System/Job Parameters Communication Intensity Distribution The Effect of ψ Distribution Classification by Communication Intensity Conclusion and Future Work Reference Fine-Grained Task Scheduling UsingAdaptive Data Structures Introduction Adaptive Data Structure Adaptive Task Pool Experimental Results Synthetic Task Application Quicksort Ray Tracing and Hierarchical Radiosity Related Work Conclusions References Exploration of the Influence of Program Inputs on CMP Co-scheduling Introduction Influence of Program Inputs on Corun Performance Handling Program Inputs for Co-scheduling Overview of CAPS Constructing Predictive Input-Behavior Models Influence of Prediction Errors on Co-scheduling Related Work Conclusion References Integrating Dynamic Memory Placement with Adaptive Load-Balancing for Parallel Codes on NUMA Multiprocessors Introduction PageMigration Feedback-Guided Dynamic Loop Scheduling Integrating FGDLS with Page Migration Implementation Exposing Physical Topology to Userlevel Code Unifying Load Balancing and Page Migration Migrating Address Ranges between Nodes Experimental Evaluation Method Matrix-Vector Multiplication Conjugate Gradient Results and Analysis Issues and Extensions Page Level False-Sharing Related Work References Guest-Aware Priority-Based Virtual Machine Scheduling for Highly Consolidated Server Introduction Related Work Xen Virtual Machine Monitor Purpose-Specific Virtual Machine Scheduling Guest-Aware Priority-Based Scheduling Motivation Design Assumption Implementation Evaluation Evaluation Environment Scheduling Latency I/O Response Time Fairness Guarantee Conclusion and Future Work References Dynamic Pipeline Mapping (DPM) Introduction Related Work Modelling the Stages of a Pipeline Application Replicated Stage Performance Model Grouped Stages Performance Model Dynamic Pipeline Mapping Algorithm Assessment Conclusions References Formal Model and Scheduling Heuristics for the Replica Migration Problem Introduction Problem Description Integer Programming Formulation Scheduling Heuristics Experiments References Topic 4: High Performance Architectures and Compilers Reducing the Number of Bits in the BTB to Attack the Branch Predictor Hot-Spot Introduction Related Work Proposed Mechanism Reducing the Number of Tag Bits Reducing the Number of Target Address Bits Experimental Results Reducing Tag Bits Reducing Target Address Bits Combining Tag and Target Address Reduction Conclusions References Low-Cost Adaptive Data Prefetching Introduction Background and Motivation Degree-Distance Policies Experimental Environment Results Conclusions References Stream Scheduling: A Framework to Manage Bulk Operations in Memory Hierarchies Introduction Sequoia Programs GSOP Graph Stream Scheduling Hierarchical Operation Ordering Software Pipelining Memory Management Tunable Search Evaluation Raw Performance and Resource Utilization Software Pipelining Conclusions and Related Work References Interprocedural Speculative Optimization of Memory Accesses to Global Variables Introduction Background Globals in the SPEC2006 Suite Speculation on Data Dependencies Analysis of the Usage of Globals Basic Analysis on Globals Extended Analysis Analysis Results Optimization Overview Placement of Compensation Code Speculative Compensation Code Case Study: Speculation on the Intel Itanium Implementation Results Related Work Conclusion References Efficiently Building the Gated Single Assignment Form in Codes with Pointers in Modern Optimizing Compilers Introduction Algorithm for the Construction of the GSA Form Implementation Using the GIMPLE-SSA Infrastructure Experimental Results Conclusions References Inter-block Scoreboard Schedulingin a JIT Compiler for VLIW Processors Introduction Local Instruction Scheduling Acyclic Instruction Scheduling Scoreboard Scheduling Principles Scoreboard Scheduling Implementation Global Instruction Scheduling Postpass Inter-region Scheduling Inter-block Scoreboard Scheduling Characterization of Fixed-Points Experimental Results Conclusions References Global Tiling for Communication MinimalParallelization on Distributed Memory Systems Introduction and Related Work Nomenclature and Definitions Basic Definitions and Assumptions Iteration Space and Data Space Tiling Transformation Tile Shape Selection and Tile-to-Processor Mapping Semi-oblique Shaped Tile Tile-to-Processor Mapping Global Tiling Problem Local Tiling Candidates Tile Layout Graph 0-1 Integer Programming Problem Experimental Results Conclusion References Topic 5: Parallel and Distributed Databases Reducing Transaction Abort Rateswith Prioritized Atomic Multicast Protocols Introduction SystemModel Reviewing Atomic Protocols Priority Management Integration in Database Replication Systems Experimental Work Environment Test Application Methodology Parameters Results Discussion Conclusions References Fault-Tolerant Partial Replication in Large-Scale Database Systems Introduction System Model and Assumptions Operations and Locks Transactions The Algorithm Overview Initial Execution Phase Submission Phase Certification Phase Precedence Graph Deciding Closure Phase Commitment Phase Initial Execution on More Than One Site Performance Analysis Concluding Remarks RelatedWork Conclusion References Exploiting Hybrid Parallelism inWeb Search Engines Introduction Speeding Up Round-Robin Query Processing Distributed Inverted File Query Processing Iterative Ranking and Round-Robin Query Processing Hybrid Parallelization Experiments Experimental Setting Performance Results Conclusions References Complex Queries for Moving Object Databases in DHT-Based Systems Introduction Meta-index Distribution P2P Crawling (Polling) Query Processing Algorithms Experimental Evaluation Conclusions References Scheduling Intersection Queries in Term Partitioned Inverted Files Introduction Scheduling Framework Description and Evaluation of Scheduling Algorithms Document Versus Best-Term Partitioned Strategies Conclusions References Topic 6: Grid and Cluster Computing Integration of GRID Superscalar and GridWay Metascheduler with the DRMAA OGF Standard Introduction Grid Applications Development Solution (GridAD) Enabling Grid Technologies GRID Superscalar GridWay Metascheduler Integration DRMAA Implementation in GridWay Metascheduler DRMAA Usage in GRIDSs Portal Development Experiences Conclusions and Future Work References Building Hierarchical Grid Storage Using the GFARM Global File System and the JUXMEM Grid Data-Sharing Service Introduction Related Work Combining RAM and Disk Storage to Achieve Scalability, Persistence and Efficiency The JUXMEM Data Sharing Service The GFARM Distributed File System Our Proposal: A Hybrid Grid Memory Hierarchy Implementing the JUXMEM-GFARM Interaction Flushing Data from JUXMEM to GFARM Restoring Data from GFARM to JUXMEM Feasibility Study: Evaluation Conclusion and Future Work References Enhancing Grids for Massively Multiplayer Online Computer Games Motivation RTF: A Grid-Based Game Middleware Middleware Interface Game State Distribution Modeling and Handling the Game State Grid Management Architecture Scheduler Services Hoster Services Experimental Results Conclusion and Related Work References Spectral Clustering Scheduling Techniques for Tasks with Strict QoS Requirements Introduction Joint Optimization of Resource Performance and QoS Requirements The Proposed Task Scheduling Policy Matrix Representation Optimization in the Continuous Domain Discrete Approximation Lower Bound - Scheduling Efficiency Experimental Results Conclusions References QoS-Oriented Reputation-Aware Query Scheduling in Data Grids Introduction Related Work QoS-Oriented Reputation-Based Scheduling System Model Two-Phase Reputation-Aware Scheduling Model Performance Metrics Experimental Results Conclusions and Future Work References Flying Low: Simple Leases with Workspace Pilot Introduction Approach Overview of the Workspace Services Two-Level Provisioning Leasing Resources with Workspace Pilot: Client’s Viewpoint Implementation The LRM Adapter The Workspace Pilot The Nuts and Bolts of VM Deployment: Workspace Control Implementation Experimental Evaluation Related Work Conclusions References Self-configuring Resource Discovery on a Hypercube Grid Overlay Introduction The Hypercube Overlay Architecture Resource Search Using HGRID The Search Procedure in an Hn Using Algorithm-H A Complete Example Using Algorithm-H Related Work Performance Evaluation Conclusions and Future Work References Auction Protocols for Resource Allocations inAd-Hoc Grids Introduction Related Work Economic Price-Based Mechanisms Auction Mechanisms System Implementation System Architecture Consumer/Producer Pricing Algorithm Performance Evaluation Experimental Setup Experimental Results Throughput Consumer Surplus Producer Surplus Uncertainty Measure Results Discussion Conclusion References GrAMoS: A Flexible Service for WS-AgreementMonitoring in Grid Environments Introduction QoS in Grid Computing Design of GrAMoS Architecture of GrAMos Experimental Results Conclusions and Future Work References Scalability of Grid Simulators: An Evaluation Introduction Simulator Overview SimGrid GridSim GES Evaluation Test I: General Scaling Test II: Job Scaling Threading and Virtual Memory Conclusion References Performance Evaluation of Data Management Layer byData Sharing Patterns for Grid RPC Applications Introduction Data Sharing Pattern for Data Management Layer Data Sharing Pattern in Grid RPC Applications Data Sharing Pattern OmniStorage: A Data Management Layer for Grid RPC Applications Design of OmniStorage Implementations Performance Evaluation Discussion for an Optimal Data Transfer Method Related Work Conclusion and Future Work References The Impact of Clustering on Token-BasedMutual Exclusion Algorithms Introduction Naimi-Tr´ehel’s Algorithm Composition Approach to Mutual Exclusion Algorithms Performance Evaluation Flat Algorithm Hierarchical Algorithm Related Work Conclusion References Reducing Kernel Development Complexity in Distributed Environments Introduction Kernel Distributed Data Manager kDDM Sets IO Linkers Interface Linkers Manipulation Functions Replication and Coherence Kernel Distributed File System Disk Layout File System Architecture kDFS Inode Management kDFS Content Management Related Work Conclusion and Future Work References A Twofold Distributed Game-Tree Search Approach Using Interconnected Clusters Introduction Heuristic Game-Tree Search Computer Chess and Computing Power Combining Optimistic Pondering with the Young Brothers Wait Concept Using YBWC as State of the Art Parallelization at the Intra-cluster Level Using Optimistic Pondering at the Inter-cluster Level Results Test Suite for Calibration of the Intra-cluster Part of GRIDCHESS Tournament Participation Results Conclusions and Future Work References Topic 7: Peer-to-Peer Computing Scalable Byzantine Fault Tolerant Public Key Authentication for Peer-to-Peer Networks Introduction Related Work Notation Definition and System Model Byzantine Fault Tolerant Public Key Authentication Underlying P2P Overlay Network Public Key Authentication Trusted Group Maintenance Evaluation Simulation Methodology Experiments Conclusion and Future Work References Secure Forwarding in DHTs - Is Redundancy theKey to Robustness? Introduction Related Work Problem Statement, Model and Assumptions The Forwarding Problem Intertwined Multi-path Routing (IMR) Higher-Reputated Neighbor Selection (HNS) Simulation Results Conclusions References P2P Evolutionary Algorithms: A Suitable Approach for Tackling Large Instances in Hard Optimization Problems Introduction Related Work Overall Model Description Evolvable Agent Methodology and Experimental Setup The Benchmark Experimental Setup A Method for Estimating the Population Size Results Conclusions References Efficient Processing of Continuous Join Queries Using Distributed Hash Tables Introduction System Model and Problem Definition DHT Model Stream Processing Model Gossip Dissemination System Problem Definition DHTJoin Method Indexing Tuples Disseminating Queries Performance Evaluation Network Traffic Gossip Dissemination Related Work Conclusion References Topic 8: Distributed Systems and Algorithms Automatic Prefetching with Binary Code Rewriting in Object-Based DSMs Introduction Implementation of Object-Based DSMs Memory Access Profiling and Dynamic Code Rewriting Design Considerations Profiling State Machine Dynamic Adaption of the Prefetching Distance Performance Related Work Conclusions References A PGAS-Based Algorithm for the Longest Common Subsequence Problem Introduction A PGAS-Based Algorithm The Building Phase The Backtracing Phase Experimental Results Conclusion and Future Work References Data Mining Algorithms on the Cell Broadband Engine Introduction The Cell Broadband Engine Architecture Data Mining Algorithms K-Means Algorithm for Clustering RBF Neural Network for Classification Apriori for Association Mining Optimization on the Cell K-Means on the Cell RBF on the Cell Apriori on the Cell Experimental Results Related Work Conclusion References Efficient Management of Complex Striped Filesin Active Storage Introduction Related Work Active Storage Overview Management of Complex Striped Files Striped Files with Chunk-Aligned Records Striped Files with Unaligned Records Mapper Component for Processing of Complex Data Formats Evaluation Experimental Results for DSCAL Experimental Results for ClimStat Conclusions References Topic 9: Parallel and Distributed Programming Improving the Performance of Multiple Conjugate Gradient Solvers by Exploiting Overlap Introduction MILC Conjugate Gradient Solvers Combining Multiple Conjugate Gradient Solvers Overlapping Gather Operations Overlapping Collective Operations Evaluation Experimental Set-Up Results Related Work Conclusions References A Software Component Model with Spatial and Temporal Compositions for Grid Infrastructures Introduction Composition in Space and Time: Properties and Discussion Composition in Space Composition in Time Discussion Toward a Spatio-temporal Composition Model Targeted Properties Analysis of Design Models for a Spatio-temporal Composition Model STCM: A Spatio-temporal Model Based on GCM and AGWL Extending GCM Components with Tasks and Temporal Ports Life Cycle Management of Task-Components A Composition Language Based on a Modified AGWL Proof-of-Concept Implementation Example of an Application Description Conclusion and Future Works References A Design Pattern for Component Oriented Development of Agent Based Multithreaded Applications Introduction Motivating Example Pattern Description Implementation Discussion Performance Conclusions References Advanced Concurrency Control for Transactional Memory Using Transaction Commit Rate Introduction P-only Concurrency Control Experimental Platform Concurrency Control Parameters Software and Hardware Platform Benchmarks Performance Evaluation Execution Time Resource Usage Transaction Execution Metrics Controller Responsiveness Conclusion References Meta-programming Applied to Automatic SMP Parallelization of Linear Algebra Code Introduction NT2: A High Performance Linear Algebra Library A Simple NT2 Use Case NT2 Implementation An SMP-Aware Implementation of NT2 A Performance Model for SMP Architectures Meta-programming the Parallelization Heuristic Experimental Results Discussion Conclusion References Solving Dense Linear Systems on Graphics Processors Introduction Overview of the Cholesky and LU Factorization Methods Computing the Cholesky and LU Factorizations on GPUs Padding Hybrid Algorithm Recursive Implementation Iterative Refinement Experimental Results Experimental Setup Basic Blocked Implementations on CPU and GPU Blocked Implementation with Padding Hybrid and Recursive Implementations Iterative Refinement Conclusions References Radioastronomy Image Synthesis on theCell/B.E. Introduction The Gridding and Degridding Kernels Radioastronomy - A Primer Building the Sky Image Application Analysis Parallelization on the Cell/B.E. Cell/B.E. Overview Parallelization on the Cell/B.E. Experiments and Results Overall Application Performance SPE Utilization Scalability Analysis Multi-baseline Parallelization The Scale of the Real Application Related Work Conclusions and Future Work References Parallel Lattice Boltzmann Flow Simulation on Emerging Multi-core Platforms Introduction Parallel Lattice Boltzmann Flow Simulation Algorithm Lattice Boltzmann Method Parallel LBM Algorithm Experiments Experimental Test Bed Performance Test Results Conclusions References Topic 10: Parallel Numerical Algorithms Parallel Algorithms for Triangular Periodic Sylvester-Type Matrix Equations Introduction Parallel Algorithms for Periodic Triangular Matrix Equations Implementation Issues Experimental Results Summary and Future Work References A Parallel Sparse Linear Solver for Nearest-Neighbor Tight-Binding Problems Introduction and Motivation Renormalization Algorithm Results Test 1 Test 2 Test 3 Test 4 Conclusion References Exploiting the Locality Properties of Peano Curves for Parallel Matrix Multiplication Introduction Matrix Multiplication Using Peano Curves Exploiting the Peano Algorithm’s Locality Properties Parallelisation and Implementation PerformanceResults Conclusion References Systematic Parallelization of Medical Image Reconstruction for Graphics Hardware Introduction PET and the List-Mode OSEM Algorithm Distributed-Memory Parallelization Parallelization: From Distributed-Memory to GPU GPU Architecture and Language Support Identification of Data-Parallel Functions Using Petri Nets Experimental Results (Distributed-Memory vs. GPU) Conclusion References Load-Balancing for a Block-Based Parallel Adaptive 4D Vlasov Solver Introduction The Parallel Adaptive Solver Numerical Scheme Parallel Algorithm Load-Balancing Mechanism Imbalance Detection Partitioning Algorithm Partitions Re-mapping on Processors Performance Measurements Conclusion References A Parallel Sensor Scheduling Technique for FaultDetection in Distributed Parameter Systems Introduction Sensor Selection for Fault Detection Solution Via Branch-and-Bound Parallel Realization of Branch and Bound Computational Results Conclusions References Topic 11: Distributed and High-PerformanceMultimedia On a Novel Dynamic Parallel Hardware Architecture for Lifting-Based DWT Introduction The Lifting-Based DWT DWT Lifting-Based Algorithm Hardware Architecture for DWT Lifting-Based Algorithm The Prediction Unit The Update Unit Inverse Prediction and Update Implementation Unified Unit for DWT Lifting-Based Prediction and Update Processing Dynamic Parallel Hardware Architecture for Lifting-Based DWT Algorithm Experimental Results Hard Resources Utilization Performance Evaluation Conclusion References Analytical Evaluation of Clients’ Failures in a LVoD Architecture Based on P2P and Multicast Paradigms Introduction P2P Multicast Delivery Scheme Overview The Failure Management Process: Description and Models Failure Detection Failure Recovery Maintenance of System Information Coherence Performance Evaluation Conclusions References A Search Engine Index for Multimedia Content Introduction Metric Spaces and Indexing Strategies List of Clusters (LC) Sparse Spatial Selection (SSS) LC-SSS Combination and Refinements Parallelism Experimental Results Conclusions References Topic 12: Theory and Algorithms for Parallel Computation Bi-objective Approximation Scheme for Makespan and Reliability Optimization on Uniform Parallel Machines Introduction Problem Related Works On the Approximability Solving the Bi-objective Problem $\langle \bar{\rho}_1, \rho_2\rangle$-Approximation Algorithm A Dual Approximation Algorithm Pareto Set Approximation Algorithm Conclusion References Deque-Free Work-Optimal Parallel STLAlgorithms Introduction Related Work: Parallel STL and Work Stealing The Deque-Free Work Stealing Algorithm Theoretical Bounds for Online Granularity Application to the STL Experimentations Conclusions References Topic 13: High-Performance Networks Reducing Packet Dropping in a Bufferless NoC Introduction Motivation Related Work Blind Packet Switching Reducing Packet Dropping by Misrouting and Loopback Channels Evaluation Simulation Environment Evaluation Results Conclusions and Future Work References A Communication-Aware Topological MappingTechnique for NoCs Introduction A New Topological Mapping Technique Correlation of the Model of Network Resources Performance Evaluation Conclusions and Future Work References Approximating the Traffic Grooming Problem with Respect to ADMs and OADMs Introduction Background Previous Works Our Contribution Problem Definition NP-Completeness Approximation Algorithms The MERGE(EA) Algorithm GROOM − OADM: An Edge-Algorithm for the Minimization of OADMs GROOM: An Edge-Algorithm for the Combined Traffic Grooming Problem Summary References On the Influence of the Packet Marking and Injection Control Schemes in Congestion Management for MINs Introduction Current CMMs Approaches Renato’s Proposal Pfister’s Implementation MVCM Proposal Performance Evaluation Network Configurations Evaluation Results Conclusions References Deadlock-Free Dynamic Network Reconfiguration Based on Close Up*/Down* Graphs Introduction Network Reconfiguration in Up*/Down*-Based Interconnects A Dynamic Reconfiguration Scheme Based on Close Graphs Performance Evaluation Experiment Setup Impact on the Management Time Impact on the Network Service Conclusions References HITP: A Transmission Protocol for Scalable High-Performance Distributed Storage Introduction Active Network Caching and the INCA System State-of-the-Art Transmission Protocols HITP: High-Volume INCA Transport Protocol A Simplified Header HITP Control Performance Evaluation Emulation Methodology Execution Scenarios Throughput Measurements Conclusions References Author Index