Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance [1st ed.] 0974364924

The growing popularity of multimedia extensions to general-purpose microprocessors has renewed the interest in vectorizi

306 52 6MB

English Pages 236 [251] Year 2006

Report DMCA / Copyright

DOWNLOAD DJVU FILE

Table of contents :
Preface 10
1. Introduction 12
Architectural Acceleration Mechanisms 12
Pipelining and Replication 13
Speedup 14
Quick Tour of Parallel Architectures 15
Data Parallel Architectures 15
Instruction-Level Parallel Architectures 17
Process-Level Parallel Architectures 18
Multimedia Extensions 19
MMX™ Technology 20
Streaming-SIMD Extensions 22
Intra-Register Vectorization 24
2. Instruction Set Preliminaries 28
Instruction Set Summary 28
Instruction Format 29
Packed Data Elements 29
Data Movement Instructions 30
Arithmetic Instructions 36
Logical Instructions 39
Comparison Instructions 40
Conversion Instructions 41
Shift Instructions 43
Shuffle Instructions 44
Unpack Instructions 45
Cacheability Control and Prefetch Instructions 46
State Management Instructions 47
The Intel NetBurst® Microarchitecture 48
Execution Logic 48
Memory Hierarchy 50
3. Language Preliminaries 52
The C Programming Language 52
Data Types 53
Expressions 55
Statements 57
Loop and Idiom Recognition 58
Well-Behaved Loops 59
Idiom Recognition 61
4. Data Dependence Theory 64
Data Dependences 64
Data Dependence Definitions 64
Data Dependence Terminology 66
Data Dependence Graphs 67
Data Dependence Analysis 68
Data Dependence Problems 68
Data Dependence Solvers 70
Hierarchical Data Dependence Analysis 72
Improving Data Dependence Analysis 74
Compiler Hints for Data Dependences 75
Aliasing Analysis 75
Dynamic Data Dependence Analysis 76
5. Vectorization Essentials 80
Validity of Vectorization 80
Preserving Data Dependences 81
Preserving Integer Precision 84
Preserving Floating-Point Precision 86
Vector Code Generation 88
General Framework 88
Vector Data Type Selection 90
Unit-Stride Memory References 91
Rotating Read-Only Memory References 92
Non-Unit-Stride Memory References 93
Scalar Memory References 94
Operators 108
MIN, MAX, and ABS Operators 110
Type Conversions 113
Mathematical Functions 117
Conditional Statements 121
6. Alignment Optimizations 130
Intraprocedural Alignment Optimizations 131
Memory Allocation and Data Layout 131
Intraprocedural Alignment Analysis 132
Cache Line Split Optimizations 134
Interprocedural Alignment Optimizations 140
Interprocedural Alignment Analysis 140
Exploiting Interprocedural Alignment Information 145
Improving Alignment Optimizations 146
Compiler Hints for Alignment 146
Multi-Version Code 146
Dynamic Loop Peeling 149
7. Supplemental Optimizations 156
Idiom Recognition 156
Conversion Idioms 157
Arithmetic Idioms 158
Reduction Idioms 160
Saturation Idioms 161
Search Loops 171
Complex Data 174
Complex Numbers 174
Single-Precision Complex Data Types 175
Double-Precision Complex Data Types 179
Memory Hierarchy Optimizations 180
High-Level Optimizations 180
Vector Register Reuse 182
Low-Level Optimizations 185
8. Vectorization Beyond Loops 188
Loop Materialization 189
Rollable Statements and Expressions 189
Loop Materialization and Collapsing 191
Inexpensive Loop Materialization 192
Improved Loop Materialization 194
Performance Considerations 196
Low Trip-Count Loops 196
High Trip-Count Loops 199
9. Vectorization with the Intel Compilers 204
Vectorization Overview 205
Compiler Switches 205
Profile-Guided Optimization 209
Compiler Hints 210
Vectorization Guidelines 214
Design and Implementation Considerations 215
Focus of Optimization 219
Diagnostics-Guided Optimization 219
Final Remarks 224
Some More Experiments 224
Future Trends in Multimedia Extensions 227
References 230
Index 242

Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance [1st ed.]
 0974364924

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
Recommend Papers