Euro-Par 2008 Parallel Processing: 14th International Euro-Par Conference, Las Palmas de Gran Canaria, Spain, August 26-29, 2008, Proceedings (Lecture Notes in Computer Science, 5168) 3540854509, 9783540854500

This book constitutes the refereed proceedings of the 14th International Conference on Parallel Computing, Euro-Par 2008

129 93 42MB

English Pages 992 [991] Year 2008

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Title Page
Preface
Organization
Table of Contents
Topic 1: Support Tools and Environments
Clock Synchronization in Cell BE Traces
Introduction
Related Work
Clock Synchronization Algorithm
Implementation Aspects
Experimental Results
Conclusions
References
DGSim: Comparing Grid Resource Management Architectures through Trace-Based Simulation
Introduction
Requirements for Simulating Grid Resource Management Architectures
DGSim: A Framework for Simulating Grid ResourceManagement Architectures
Design Overview
A Model for Inter-operated Cluster-Based Grids
Grid Dynamics and Grid Evolution
Grid Workload Generator
Simulator Validation
Experiments Using DGSim
Performance Evaluation Using Real Workload Traces
Performance Evaluation Using Realistic Workload Traces
Related Work
Conclusion and Future Work
References
Supporting Parameter Sweep Applications with Synthesized Grid Services
Introduction
Parametric Modeling
Value Sets
Web Service Interface Syntax
Building the Parameter Space
Reduction Operations
Implementation
Experiments
Conclusion
References
A P2P Approach to Resource Discovery in On-Line Monitoring of Grid Workflows
Introduction
Related Work
The Automatic Resource Discovery Scenario
Performance Evaluation
Case Study: Monitoring of Coordinated Traffic Management Workflow
Conclusion
References
Transparent Mobile Middleware Integration forJava and .NET Development Environments
Introduction
Middleware Integration Architecture
Example: Incremental Object Replication
Generic Code Generation Architecture
Implementation
.NET
Java
Evaluation
Related Work
Conclusions and Future Work
References
Providing Non-stop Service for Message-Passing Based Parallel Applications with RADIC
Introduction
The RADIC Architecture
Protectors and Observers
RADIC Protection Levels
The Basic Protection Level
The Resilient Protection Level
Experiments
Related Work
Conclusions and Future Work
References
On-Line Performance Modeling for MPI Applications
Introduction
On-Line Performance Modeling
Modeling Individual Tasks
Modeling MPI Communications
Parallel Application Modeling
Causal Paths
Model-Based Analysis Techniques
Prototype Implementation
Start-Up
Dynamic TAG Construction
Tracking MPI Communications
Experimental Evaluation
Related Work
Conclusions and Future Work
References
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines
Introduction
Common Approaches for Programming Clusters ofNUMA Nodes
MPC: MultiProcessor Communications
Execution Model
Specialized MxN Thread Scheduling
Scheduler-Integrated Collective Communications
Optimized NUMA-Aware and Thread-Aware Allocator
Experimental Results
Scalability Results with Domain Overloading
Memory Allocation and Data Placement Results
Conclusion and Future Works
References
Topic 2: Performance Prediction and Evaluation
Directory-Based Metadata Optimizations for Small Files in PVFS
Introduction
Current Design of PVFS
Metadata Optimizations
File System Operations
Evaluation
Environment
File Creation
File Listing
File Removal
Summary
Conclusion and Future Work
References
Caspian: A Tunable Performance Model for Multi-core Systems
Introduction
Performance Analysis
Assumptions and Notations
Analytical Model
Model Validation
Conclusions and Future Work
References
Performance Model for Parallel Mathematical Libraries Based on Historical Knowledgebase*
Introduction
Performance Analysis for PETSc Applications
Performance Model
The Pattern Recognition Engine
The Knowledgebase
The Data Mining Engine
Model Assessment
Related Efforts
Conclusions
References
A Performance Model of Dense Matrix Operations on Many-Core Architectures
Introduction
The Abstract Many-Core Architecture
Problem Formulation
The Performance Model
MatrixMultiplication
LU and Cholesky Decompositon
Discussion of the Model
Conclusion
References
Empirical Analysis of a Large-Scale Hierarchical Storage System
Introduction
Hierarchical Organization of Jaguar’s Storage System
Organization of Storage Devices and Lustre File Systems on Jaguar
Bandwidth Characterization of Different Hierarchies
Single-OST
Single Rack
Cross-Rack
System Scalability
Parallel File Open
Optimizing File Distribution
Case Study: Turbulent Combustion Simulation, S3D
Conclusions
References
To Snoop or Not to Snoop: Evaluation of Fine-Grainand Coarse-Grain Snoop Filtering Techniques
Introduction
Overview of Evaluated Techniques
Methodology
Results and Analysis
Region Scout Results
RCA Results
Directory Cache Results
Comparison
Conclusion
References
Performance Implications of Cache Affinityon Multicore Processors
Introduction
Multicore Uniprocessors
Methodology
Results
Multicore Multiprocessors
Related Work
Conclusions
References
Observing Performance Dynamics Using Parallel Profile Snapshots
Introduction
Design
Profile Snapshots in TAU
Trace File Conversion
Application of Profile Snapshots
Related Work
Conclusions and Future Work
References
Event Tracing and Visualization for CellBroadband Engine Systems
Introduction
Background and Related Work
Cell Broadband Engine
Vampir Tool Suite
Cell Enabled Software Tracing Tools
Design of the Tracing Infrastructure
Overview
Prototype Implementation
Event Model
Visualization
Hybrid MPI/Cell Analysis
Results
Tracing Examples
Trace Overhead and Performance Impact
Conclusions and Future Work
References
Evaluating Heterogeneous Memory Model by RealisticTrace-Driven Hardware/Software Co-simulation
Introduction
Heterogeneous Memory Architectures
Hierarchical Model
Flat Model
Research Methodology
Physical Environment of the Simulation System
Workload Machine
Workload
Experimental Result
Target Configurations
Workload Characteristics
Write Reference to $\rm M_2$
Sensitivity to $\rm_M1$
Sensitivity to Associativity
Conclusion and Future Work
References
Mapping Heterogeneous Distributed Applications on Clusters
Introduction
Performance of FlowVR Applications
Modeling the Problem Using Constraint Programming
Constraint Programming
Problem Modeling
Experiments
Validating Mappings
Generating Mappings
Testing Application and Hardware Limits
Optimization of the Cluster Use
Conclusion and Future Work
References
Neural Network-Based Load Prediction for Highly Dynamic Distributed Online Games
Introduction
Method
Neural Network-Based Load Prediction
Distributed FPS Game Simulator
Tuning Experiments
Network Type
Network Structure
Transfer Function and SignalExpanding
Results
Related Work
Conclusions
References
Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring
Introduction
State-of-the-Art and Related Work
Useful Performance Statistics and Metrics
Extended Performance Monitoring with PVFS
Results
Conclusions and Future Work
References
Topic 3: Scheduling and Load Balancing
Dynamic Grid Scheduling Using Job RuntimeRequirements and Variable Resource Availability
Introduction
Scheduling Technique
Collecting Job Runtime Information
Collecting Resource Availability Information
Scheduling Mechanism
Implementation
CoBRA
Extending the Scheduler
Testing and Results
Testing Technique
Test Configuration
Results
Future Work
Conclusion
References
Enhancing Prediction on Non-dedicated Clusters
Introduction
Prediction Model for Non-dedicated Clusters
Obtaining SP
Obtaining SC
Prediction Engine
Experimentation
Experimental Results
References
Conclusions and Future Work
Co-allocation with Communication Considerations in Multi-cluster Systems
Introduction
Related Work
Communication and Its Effect on Co-allocation
Workloads
Performance Evaluation
Research Model
Job Stream
Communication
Scheduling Algorithm and Placement Policy
Experimental Set Up
Viability of Co-allocation
The Effect of System and Job Parameters
The Effect of Thres on Performance
The Effect of ψ on Performance Sensitivity to Thres
The Effect of Load
Overall Effect of System/Job Parameters
Communication Intensity Distribution
The Effect of ψ Distribution
Classification by Communication Intensity
Conclusion and Future Work
Reference
Fine-Grained Task Scheduling UsingAdaptive Data Structures
Introduction
Adaptive Data Structure
Adaptive Task Pool
Experimental Results
Synthetic Task Application
Quicksort
Ray Tracing and Hierarchical Radiosity
Related Work
Conclusions
References
Exploration of the Influence of Program Inputs on CMP Co-scheduling
Introduction
Influence of Program Inputs on Corun Performance
Handling Program Inputs for Co-scheduling
Overview of CAPS
Constructing Predictive Input-Behavior Models
Influence of Prediction Errors on Co-scheduling
Related Work
Conclusion
References
Integrating Dynamic Memory Placement with Adaptive Load-Balancing for Parallel Codes on NUMA Multiprocessors
Introduction
PageMigration
Feedback-Guided Dynamic Loop Scheduling
Integrating FGDLS with Page Migration
Implementation
Exposing Physical Topology to Userlevel Code
Unifying Load Balancing and Page Migration
Migrating Address Ranges between Nodes
Experimental Evaluation
Method
Matrix-Vector Multiplication
Conjugate Gradient
Results and Analysis
Issues and Extensions
Page Level False-Sharing
Related Work
References
Guest-Aware Priority-Based Virtual Machine Scheduling for Highly Consolidated Server
Introduction
Related Work
Xen Virtual Machine Monitor
Purpose-Specific Virtual Machine Scheduling
Guest-Aware Priority-Based Scheduling
Motivation
Design
Assumption
Implementation
Evaluation
Evaluation Environment
Scheduling Latency
I/O Response Time
Fairness Guarantee
Conclusion and Future Work
References
Dynamic Pipeline Mapping (DPM)
Introduction
Related Work
Modelling the Stages of a Pipeline Application
Replicated Stage Performance Model
Grouped Stages Performance Model
Dynamic Pipeline Mapping
Algorithm Assessment
Conclusions
References
Formal Model and Scheduling Heuristics for the Replica Migration Problem
Introduction
Problem Description
Integer Programming Formulation
Scheduling Heuristics
Experiments
References
Topic 4: High Performance Architectures and Compilers
Reducing the Number of Bits in the BTB to Attack the Branch Predictor Hot-Spot
Introduction
Related Work
Proposed Mechanism
Reducing the Number of Tag Bits
Reducing the Number of Target Address Bits
Experimental Results
Reducing Tag Bits
Reducing Target Address Bits
Combining Tag and Target Address Reduction
Conclusions
References
Low-Cost Adaptive Data Prefetching
Introduction
Background and Motivation
Degree-Distance Policies
Experimental Environment
Results
Conclusions
References
Stream Scheduling: A Framework to Manage Bulk Operations in Memory Hierarchies
Introduction
Sequoia Programs
GSOP Graph
Stream Scheduling
Hierarchical Operation Ordering
Software Pipelining
Memory Management
Tunable Search
Evaluation
Raw Performance and Resource Utilization
Software Pipelining
Conclusions and Related Work
References
Interprocedural Speculative Optimization of Memory Accesses to Global Variables
Introduction
Background
Globals in the SPEC2006 Suite
Speculation on Data Dependencies
Analysis of the Usage of Globals
Basic Analysis on Globals
Extended Analysis
Analysis Results
Optimization
Overview
Placement of Compensation Code
Speculative Compensation Code
Case Study: Speculation on the Intel Itanium
Implementation
Results
Related Work
Conclusion
References
Efficiently Building the Gated Single Assignment Form in Codes with Pointers in Modern Optimizing Compilers
Introduction
Algorithm for the Construction of the GSA Form
Implementation Using the GIMPLE-SSA Infrastructure
Experimental Results
Conclusions
References
Inter-block Scoreboard Schedulingin a JIT Compiler for VLIW Processors
Introduction
Local Instruction Scheduling
Acyclic Instruction Scheduling
Scoreboard Scheduling Principles
Scoreboard Scheduling Implementation
Global Instruction Scheduling
Postpass Inter-region Scheduling
Inter-block Scoreboard Scheduling
Characterization of Fixed-Points
Experimental Results
Conclusions
References
Global Tiling for Communication MinimalParallelization on Distributed Memory Systems
Introduction and Related Work
Nomenclature and Definitions
Basic Definitions and Assumptions
Iteration Space and Data Space Tiling Transformation
Tile Shape Selection and Tile-to-Processor Mapping
Semi-oblique Shaped Tile
Tile-to-Processor Mapping
Global Tiling Problem
Local Tiling Candidates
Tile Layout Graph
0-1 Integer Programming Problem
Experimental Results
Conclusion
References
Topic 5: Parallel and Distributed Databases
Reducing Transaction Abort Rateswith Prioritized Atomic Multicast Protocols
Introduction
SystemModel
Reviewing Atomic Protocols
Priority Management
Integration in Database Replication Systems
Experimental Work
Environment
Test Application
Methodology
Parameters
Results
Discussion
Conclusions
References
Fault-Tolerant Partial Replication in Large-Scale Database Systems
Introduction
System Model and Assumptions
Operations and Locks
Transactions
The Algorithm
Overview
Initial Execution Phase
Submission Phase
Certification Phase
Precedence Graph
Deciding
Closure Phase
Commitment Phase
Initial Execution on More Than One Site
Performance Analysis
Concluding Remarks
RelatedWork
Conclusion
References
Exploiting Hybrid Parallelism inWeb Search Engines
Introduction
Speeding Up Round-Robin Query Processing
Distributed Inverted File
Query Processing
Iterative Ranking and Round-Robin Query Processing
Hybrid Parallelization
Experiments
Experimental Setting
Performance Results
Conclusions
References
Complex Queries for Moving Object Databases in DHT-Based Systems
Introduction
Meta-index Distribution
P2P Crawling (Polling)
Query Processing Algorithms
Experimental Evaluation
Conclusions
References
Scheduling Intersection Queries in Term Partitioned Inverted Files
Introduction
Scheduling Framework
Description and Evaluation of Scheduling Algorithms
Document Versus Best-Term Partitioned Strategies
Conclusions
References
Topic 6: Grid and Cluster Computing
Integration of GRID Superscalar and GridWay Metascheduler with the DRMAA OGF Standard
Introduction
Grid Applications Development Solution (GridAD)
Enabling Grid Technologies
GRID Superscalar
GridWay Metascheduler
Integration
DRMAA Implementation in GridWay Metascheduler
DRMAA Usage in GRIDSs
Portal Development
Experiences
Conclusions and Future Work
References
Building Hierarchical Grid Storage Using the GFARM Global File System and the JUXMEM Grid Data-Sharing Service
Introduction
Related Work
Combining RAM and Disk Storage to Achieve Scalability, Persistence and Efficiency
The JUXMEM Data Sharing Service
The GFARM Distributed File System
Our Proposal: A Hybrid Grid Memory Hierarchy
Implementing the JUXMEM-GFARM Interaction
Flushing Data from JUXMEM to GFARM
Restoring Data from GFARM to JUXMEM
Feasibility Study: Evaluation
Conclusion and Future Work
References
Enhancing Grids for Massively Multiplayer Online Computer Games
Motivation
RTF: A Grid-Based Game Middleware
Middleware Interface
Game State Distribution
Modeling and Handling the Game State
Grid Management Architecture
Scheduler Services
Hoster Services
Experimental Results
Conclusion and Related Work
References
Spectral Clustering Scheduling Techniques for Tasks with Strict QoS Requirements
Introduction
Joint Optimization of Resource Performance and QoS Requirements
The Proposed Task Scheduling Policy
Matrix Representation
Optimization in the Continuous Domain
Discrete Approximation
Lower Bound - Scheduling Efficiency
Experimental Results
Conclusions
References
QoS-Oriented Reputation-Aware Query Scheduling in Data Grids
Introduction
Related Work
QoS-Oriented Reputation-Based Scheduling
System Model
Two-Phase Reputation-Aware Scheduling Model
Performance Metrics
Experimental Results
Conclusions and Future Work
References
Flying Low: Simple Leases with Workspace Pilot
Introduction
Approach
Overview of the Workspace Services
Two-Level Provisioning
Leasing Resources with Workspace Pilot: Client’s Viewpoint
Implementation
The LRM Adapter
The Workspace Pilot
The Nuts and Bolts of VM Deployment: Workspace Control Implementation
Experimental Evaluation
Related Work
Conclusions
References
Self-configuring Resource Discovery on a Hypercube Grid Overlay
Introduction
The Hypercube Overlay Architecture
Resource Search Using HGRID
The Search Procedure in an Hn Using Algorithm-H
A Complete Example Using Algorithm-H
Related Work
Performance Evaluation
Conclusions and Future Work
References
Auction Protocols for Resource Allocations inAd-Hoc Grids
Introduction
Related Work
Economic Price-Based Mechanisms
Auction Mechanisms
System Implementation
System Architecture
Consumer/Producer Pricing Algorithm
Performance Evaluation
Experimental Setup
Experimental Results
Throughput
Consumer Surplus
Producer Surplus
Uncertainty Measure
Results Discussion
Conclusion
References
GrAMoS: A Flexible Service for WS-AgreementMonitoring in Grid Environments
Introduction
QoS in Grid Computing
Design of GrAMoS
Architecture of GrAMos
Experimental Results
Conclusions and Future Work
References
Scalability of Grid Simulators: An Evaluation
Introduction
Simulator Overview
SimGrid
GridSim
GES
Evaluation
Test I: General Scaling
Test II: Job Scaling
Threading and Virtual Memory
Conclusion
References
Performance Evaluation of Data Management Layer byData Sharing Patterns for Grid RPC Applications
Introduction
Data Sharing Pattern for Data Management Layer
Data Sharing Pattern in Grid RPC Applications
Data Sharing Pattern
OmniStorage: A Data Management Layer for Grid RPC Applications
Design of OmniStorage
Implementations
Performance Evaluation
Discussion for an Optimal Data Transfer Method
Related Work
Conclusion and Future Work
References
The Impact of Clustering on Token-BasedMutual Exclusion Algorithms
Introduction
Naimi-Tr´ehel’s Algorithm
Composition Approach to Mutual Exclusion Algorithms
Performance Evaluation
Flat Algorithm
Hierarchical Algorithm
Related Work
Conclusion
References
Reducing Kernel Development Complexity in Distributed Environments
Introduction
Kernel Distributed Data Manager
kDDM Sets
IO Linkers
Interface Linkers
Manipulation Functions
Replication and Coherence
Kernel Distributed File System
Disk Layout
File System Architecture
kDFS Inode Management
kDFS Content Management
Related Work
Conclusion and Future Work
References
A Twofold Distributed Game-Tree Search Approach Using Interconnected Clusters
Introduction
Heuristic Game-Tree Search
Computer Chess and Computing Power
Combining Optimistic Pondering with the Young Brothers Wait Concept
Using YBWC as State of the Art Parallelization at the Intra-cluster Level
Using Optimistic Pondering at the Inter-cluster Level
Results
Test Suite for Calibration of the Intra-cluster Part of GRIDCHESS
Tournament Participation Results
Conclusions and Future Work
References
Topic 7: Peer-to-Peer Computing
Scalable Byzantine Fault Tolerant Public Key Authentication for Peer-to-Peer Networks
Introduction
Related Work
Notation Definition and System Model
Byzantine Fault Tolerant Public Key Authentication
Underlying P2P Overlay Network
Public Key Authentication
Trusted Group Maintenance
Evaluation
Simulation Methodology
Experiments
Conclusion and Future Work
References
Secure Forwarding in DHTs - Is Redundancy theKey to Robustness?
Introduction
Related Work
Problem Statement, Model and Assumptions
The Forwarding Problem
Intertwined Multi-path Routing (IMR)
Higher-Reputated Neighbor Selection (HNS)
Simulation Results
Conclusions
References
P2P Evolutionary Algorithms: A Suitable Approach for Tackling Large Instances in Hard Optimization Problems
Introduction
Related Work
Overall Model Description
Evolvable Agent
Methodology and Experimental Setup
The Benchmark
Experimental Setup
A Method for Estimating the Population Size
Results
Conclusions
References
Efficient Processing of Continuous Join Queries Using Distributed Hash Tables
Introduction
System Model and Problem Definition
DHT Model
Stream Processing Model
Gossip Dissemination System
Problem Definition
DHTJoin Method
Indexing Tuples
Disseminating Queries
Performance Evaluation
Network Traffic
Gossip Dissemination
Related Work
Conclusion
References
Topic 8: Distributed Systems and Algorithms
Automatic Prefetching with Binary Code Rewriting in Object-Based DSMs
Introduction
Implementation of Object-Based DSMs
Memory Access Profiling and Dynamic Code Rewriting
Design Considerations
Profiling State Machine
Dynamic Adaption of the Prefetching Distance
Performance
Related Work
Conclusions
References
A PGAS-Based Algorithm for the Longest Common Subsequence Problem
Introduction
A PGAS-Based Algorithm
The Building Phase
The Backtracing Phase
Experimental Results
Conclusion and Future Work
References
Data Mining Algorithms on the Cell Broadband Engine
Introduction
The Cell Broadband Engine Architecture
Data Mining Algorithms
K-Means Algorithm for Clustering
RBF Neural Network for Classification
Apriori for Association Mining
Optimization on the Cell
K-Means on the Cell
RBF on the Cell
Apriori on the Cell
Experimental Results
Related Work
Conclusion
References
Efficient Management of Complex Striped Filesin Active Storage
Introduction
Related Work
Active Storage Overview
Management of Complex Striped Files
Striped Files with Chunk-Aligned Records
Striped Files with Unaligned Records
Mapper Component for Processing of Complex Data Formats
Evaluation
Experimental Results for DSCAL
Experimental Results for ClimStat
Conclusions
References
Topic 9: Parallel and Distributed Programming
Improving the Performance of Multiple Conjugate Gradient Solvers by Exploiting Overlap
Introduction
MILC
Conjugate Gradient Solvers
Combining Multiple Conjugate Gradient Solvers
Overlapping Gather Operations
Overlapping Collective Operations
Evaluation
Experimental Set-Up
Results
Related Work
Conclusions
References
A Software Component Model with Spatial and Temporal Compositions for Grid Infrastructures
Introduction
Composition in Space and Time: Properties and Discussion
Composition in Space
Composition in Time
Discussion
Toward a Spatio-temporal Composition Model
Targeted Properties
Analysis of Design Models for a Spatio-temporal Composition Model
STCM: A Spatio-temporal Model Based on GCM and AGWL
Extending GCM Components with Tasks and Temporal Ports
Life Cycle Management of Task-Components
A Composition Language Based on a Modified AGWL
Proof-of-Concept Implementation
Example of an Application Description
Conclusion and Future Works
References
A Design Pattern for Component Oriented Development of Agent Based Multithreaded Applications
Introduction
Motivating Example
Pattern Description
Implementation
Discussion
Performance
Conclusions
References
Advanced Concurrency Control for Transactional Memory Using Transaction Commit Rate
Introduction
P-only Concurrency Control
Experimental Platform
Concurrency Control Parameters
Software and Hardware Platform
Benchmarks
Performance Evaluation
Execution Time
Resource Usage
Transaction Execution Metrics
Controller Responsiveness
Conclusion
References
Meta-programming Applied to Automatic SMP Parallelization of Linear Algebra Code
Introduction
NT2: A High Performance Linear Algebra Library
A Simple NT2 Use Case
NT2 Implementation
An SMP-Aware Implementation of NT2
A Performance Model for SMP Architectures
Meta-programming the Parallelization Heuristic
Experimental Results
Discussion
Conclusion
References
Solving Dense Linear Systems on Graphics Processors
Introduction
Overview of the Cholesky and LU Factorization Methods
Computing the Cholesky and LU Factorizations on GPUs
Padding
Hybrid Algorithm
Recursive Implementation
Iterative Refinement
Experimental Results
Experimental Setup
Basic Blocked Implementations on CPU and GPU
Blocked Implementation with Padding
Hybrid and Recursive Implementations
Iterative Refinement
Conclusions
References
Radioastronomy Image Synthesis on theCell/B.E.
Introduction
The Gridding and Degridding Kernels
Radioastronomy - A Primer
Building the Sky Image
Application Analysis
Parallelization on the Cell/B.E.
Cell/B.E. Overview
Parallelization on the Cell/B.E.
Experiments and Results
Overall Application Performance
SPE Utilization
Scalability Analysis
Multi-baseline Parallelization
The Scale of the Real Application
Related Work
Conclusions and Future Work
References
Parallel Lattice Boltzmann Flow Simulation on Emerging Multi-core Platforms
Introduction
Parallel Lattice Boltzmann Flow Simulation Algorithm
Lattice Boltzmann Method
Parallel LBM Algorithm
Experiments
Experimental Test Bed
Performance Test Results
Conclusions
References
Topic 10: Parallel Numerical Algorithms
Parallel Algorithms for Triangular Periodic Sylvester-Type Matrix Equations
Introduction
Parallel Algorithms for Periodic Triangular Matrix Equations
Implementation Issues
Experimental Results
Summary and Future Work
References
A Parallel Sparse Linear Solver for Nearest-Neighbor Tight-Binding Problems
Introduction and Motivation
Renormalization Algorithm
Results
Test 1
Test 2
Test 3
Test 4
Conclusion
References
Exploiting the Locality Properties of Peano Curves for Parallel Matrix Multiplication
Introduction
Matrix Multiplication Using Peano Curves
Exploiting the Peano Algorithm’s Locality Properties
Parallelisation and Implementation
PerformanceResults
Conclusion
References
Systematic Parallelization of Medical Image Reconstruction for Graphics Hardware
Introduction
PET and the List-Mode OSEM Algorithm
Distributed-Memory Parallelization
Parallelization: From Distributed-Memory to GPU
GPU Architecture and Language Support
Identification of Data-Parallel Functions Using Petri Nets
Experimental Results (Distributed-Memory vs. GPU)
Conclusion
References
Load-Balancing for a Block-Based Parallel Adaptive 4D Vlasov Solver
Introduction
The Parallel Adaptive Solver
Numerical Scheme
Parallel Algorithm
Load-Balancing Mechanism
Imbalance Detection
Partitioning Algorithm
Partitions Re-mapping on Processors
Performance Measurements
Conclusion
References
A Parallel Sensor Scheduling Technique for FaultDetection in Distributed Parameter Systems
Introduction
Sensor Selection for Fault Detection
Solution Via Branch-and-Bound
Parallel Realization of Branch and Bound
Computational Results
Conclusions
References
Topic 11: Distributed and High-PerformanceMultimedia
On a Novel Dynamic Parallel Hardware Architecture for Lifting-Based DWT
Introduction
The Lifting-Based DWT
DWT Lifting-Based Algorithm
Hardware Architecture for DWT Lifting-Based Algorithm
The Prediction Unit
The Update Unit
Inverse Prediction and Update Implementation
Unified Unit for DWT Lifting-Based Prediction and Update Processing
Dynamic Parallel Hardware Architecture for Lifting-Based DWT Algorithm
Experimental Results
Hard Resources Utilization
Performance Evaluation
Conclusion
References
Analytical Evaluation of Clients’ Failures in a LVoD Architecture Based on P2P and Multicast Paradigms
Introduction
P2P Multicast Delivery Scheme Overview
The Failure Management Process: Description and Models
Failure Detection
Failure Recovery
Maintenance of System Information Coherence
Performance Evaluation
Conclusions
References
A Search Engine Index for Multimedia Content
Introduction
Metric Spaces and Indexing Strategies
List of Clusters (LC)
Sparse Spatial Selection (SSS)
LC-SSS Combination and Refinements
Parallelism
Experimental Results
Conclusions
References
Topic 12: Theory and Algorithms for Parallel Computation
Bi-objective Approximation Scheme for Makespan and Reliability Optimization on Uniform Parallel Machines
Introduction
Problem
Related Works
On the Approximability
Solving the Bi-objective Problem
$\langle \bar{\rho}_1, \rho_2\rangle$-Approximation Algorithm
A Dual Approximation Algorithm
Pareto Set Approximation Algorithm
Conclusion
References
Deque-Free Work-Optimal Parallel STLAlgorithms
Introduction
Related Work: Parallel STL and Work Stealing
The Deque-Free Work Stealing Algorithm
Theoretical Bounds for Online Granularity
Application to the STL
Experimentations
Conclusions
References
Topic 13: High-Performance Networks
Reducing Packet Dropping in a Bufferless NoC
Introduction
Motivation
Related Work
Blind Packet Switching
Reducing Packet Dropping by Misrouting and Loopback Channels
Evaluation
Simulation Environment
Evaluation Results
Conclusions and Future Work
References
A Communication-Aware Topological MappingTechnique for NoCs
Introduction
A New Topological Mapping Technique
Correlation of the Model of Network Resources
Performance Evaluation
Conclusions and Future Work
References
Approximating the Traffic Grooming Problem with Respect to ADMs and OADMs
Introduction
Background
Previous Works
Our Contribution
Problem Definition
NP-Completeness
Approximation Algorithms
The MERGE(EA) Algorithm
GROOM − OADM: An Edge-Algorithm for the Minimization of OADMs
GROOM: An Edge-Algorithm for the Combined Traffic Grooming Problem
Summary
References
On the Influence of the Packet Marking and Injection Control Schemes in Congestion Management for MINs
Introduction
Current CMMs Approaches
Renato’s Proposal
Pfister’s Implementation
MVCM Proposal
Performance Evaluation
Network Configurations
Evaluation Results
Conclusions
References
Deadlock-Free Dynamic Network Reconfiguration Based on Close Up*/Down* Graphs
Introduction
Network Reconfiguration in Up*/Down*-Based Interconnects
A Dynamic Reconfiguration Scheme Based on Close Graphs
Performance Evaluation
Experiment Setup
Impact on the Management Time
Impact on the Network Service
Conclusions
References
HITP: A Transmission Protocol for Scalable High-Performance Distributed Storage
Introduction
Active Network Caching and the INCA System
State-of-the-Art Transmission Protocols
HITP: High-Volume INCA Transport Protocol
A Simplified Header
HITP Control
Performance Evaluation
Emulation Methodology
Execution Scenarios
Throughput Measurements
Conclusions
References
Author Index

Euro-Par 2008 Parallel Processing: 14th International Euro-Par Conference, Las Palmas de Gran Canaria, Spain, August 26-29, 2008, Proceedings (Lecture Notes in Computer Science, 5168)
 3540854509, 9783540854500

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Recommend Papers