185 48 6MB
English Pages [55] Year 2023
INTRODUCING CHAPEL:
A PROGRAMMING LANGUAGE FOR PRODUCTIVE PARALLEL COMPUTING FROM LAPTOPS TO SUPERCOMPUTERS Brad Chamberlain, Distinguished Technologist LinuxCon, May 11, 2023
PARALLEL COMPUTING IN A NUTSHELL Parallel Computing: Using the processors and memories of multiple compute resources • in order to run a program… – faster than we could otherwise – and/or using larger problem sizes
Compute Node 0
Compute Node 1
Compute Node 2
Compute Node 3
Processor Core Memory 2
PARALLEL COMPUTING HAS BECOME UBIQUITOUS Traditional parallel computing:
Today:
• supercomputers • commodity clusters
Compute Node 0
• multicore processors • GPUs • cloud computing
Compute Node 1
Compute Node 2
Compute Node 3
Processor Core Memory 3
OAK RIDGE NATIONAL LABORATORY'S FRONTIER SUPERCOMPUTER TOP500
• 74 HPE Cray EX cabinets • 9,408 AMD CPUs, 37,632 AMD GPUs • 700 petabytes of storage capacity, peak write speeds of 5 terabytes per second using Cray ClusterStor storage system
GREEN500
HPL-MxP
1 2 1 Built by HPE, ORNL’s Frontier supercomputer is #1 on the TOP500.
1.1 exaflops of performance.
• HPE Slingshot networking cables providing 100 GB/s network bandwidth.
Source: May 30, 2022 Top500 release, HPL-MxP mixed-precision benchmark (formerly HPL-AI).
Built by HPE, ORNL’s TDS and full system are ranked #2 & #6 on the Green500.
62.68 gigaflops/watt power efficiency for ORNL’s TDS system, 52.23 gigaflops/watt power efficiency for full system.
Built by HPE, ORNL’s Frontier supercomputer is #1 on the HPL-MxP list.
7.9 exaflops on the HPL-MxP benchmark (formerly HPL-AI).
4
HPC BENCHMARKS USING CONVENTIONAL PROGRAMMING APPROACHES STREAM TRIAD: C + MPI + OPENMP #include #ifdef _OPENMP #include #endif
if (!a || !b || !c) { if (c) HPCC_free(c); if (b) HPCC_free(b); if (a) HPCC_free(a); if (doIO) { fprintf( outFile, "Failed to allocate memory (%d).\n", VectorSize ); fclose( outFile ); } return 1; }
static int VectorSize; static double *a, *b, *c; int HPCC_StarStream(HPCC_Params *params) { int myRank, commSize; int rv, errCount; MPI_Comm comm = MPI_COMM_WORLD; MPI_Comm_size( comm, &commSize ); MPI_Comm_rank( comm, &myRank ); rv = HPCC_Stream( params, 0 == myRank); MPI_Reduce( &rv, &errCount, 1, MPI_INT, MPI_SUM, 0, comm ); }
return errCount;
int HPCC_Stream(HPCC_Params *params, int doIO) { register int j; double scalar;
#ifdef _OPENMP #pragma omp parallel for #endif for (j=0; j