workshop

hierarchical parallelism
for exascale computing

NOVEMBER 11, 2020 (10am-6.30pm EST)

DIGITAL PLATFORM (CadmiumCD)

In cooperation with:

Held in conjunction with:

The International Conference for High Performance Computing,
Networking, Storage and Analysis

The International Conference for High Performance Computing, Networking, Storage and Analysis

HiPar20 welcomes HPC practitioners, from hardware and compiler experts to algorithms and software developers, to present and discuss the state of the art in emerging approaches to utilize multi-level parallelism for extreme scale computing.

SUMMARY

High-performance computing (HPC) platforms are evolving towards having fewer but more powerful nodes, driven by the increasing number of physical cores in multiple sockets and accelerators. The boundary between nodes and networks is starting to blur, with some nodes now containing tens of compute elements and memory sub-systems connected via a memory fabric.

The immediate consequence is an increasing complexity, due to ever more complex architecture (e.g., memory hierarchies), novel accelerator designs, and energy constraints.

Spurred largely by this trend, hierarchical parallelism is increasingly gaining momentum. This approach embraces the intrinsic complexity of current and future HPC systems, rather than avoiding it, by exploiting parallelism at all levels: compute, memory and network.

This workshop focuses on hierarchical parallelism. It aims at bringing together application, hardware, and software practitioners proposing new strategies to fully exploit computational hierarchies, and examples to illustrate their benefits to achieve extreme scale parallelism.

WORKSHOP PROGRAM

(Workshop date: Wednesday Nov. 11th --- 10am-6.30pm EST)

To submit the evaluation of this workshop, please go to: https://submissions.supercomputing.org/?page=Submit&id=WorkshopEvaluation&site=sc20

You can find here a log of some of the questions asked during the workshop, with replies by the speakers.

10.00AM – 10.05am (EST)

Welcome and Overview

10.05AM – 11.00am / keynote (slides)

Exploiting Hierarchical Algorithms on Ever More Hierarchical Architectures

Speaker: Kate Clark (NVIDIA)

11.00AM – 11.15am

break

11.15AM – 11.40am / PAPER 1 (slides)

A Case Study and Characterization of a Many-socket, Multi-tier
NUMA HPC Platform

Authors: Connor Imes (USC), Steven Hofmeyr (LBL), Dong In D. Kang (USC), John Paul Walters (USC)

11.40AM – 12.05PM / PAPER 2

Introducing multi-level parallelism, at coarse, fine and instruction level to enhance the performance of iterative solvers for large sparse linear systems on Multi- and Many-core architecture

Authors: Jean-Marc Gratien (IFPEN)

12.05PM – 12.30PM / PAPER 3 (slides)

Using Hierarchical Parallelism to Accelerate the Solution of Many Small Partial Differential Equations

Authors: Jacob Merson (RPI), Mark S. Shephard (RPI)

12.30PM – 12.55PM / PAPER 4 (Slides)

Flexible Runtime Reconfigurable Computing Overlay Architecture and Optimization for Dataflow Applications

Authors: Mihir Shah (UT Dallas), Benjamin Carrion Schafer (UT Dallas)

12.55PM – 01.45PM / KEYNOTE

Single-level Programming on Hierarchical Hardware via Adaptive Runtime? Maybe

Speaker: Laxmikant Kale (UIUC)

01.45PM – 02.30Pm

break

02.30PM – 03.45PM

Panel Session

Panelists: Irina Demeshko (LANL), Sadaf R. Alam (CSCS), Sunita Chandrasekaran (Univ Delaware), Peter Hofstee (IBM), Stephen Jones (NVIDIA)
Moderator: Christian Trott (Sandia Nationals Labs)

03.45PM – 04.00Pm

break

04.00PM – 05.00PM / invited talk (slides)

A Portable Asynchronous Tasking Approach to Hierarchical Parallelism - Successes, Challenges and Future Prospects

Speaker: Martin Berzins (Univ. of Utah)

05.00PM – 06.00PM / invited talk

Glow: A Machine Learning Compiler and Execution Engine

Speaker: Jordan Fix (Facebook)

06.00PM

Concluding Remarks

10.00am – 10.05am (EST):

Welcome and Overview

10.05am – 11.00am:

Keynote

Title: TBD
Speaker: Kate Clark (NVIDIA)

11.00am – 11.15am:

Break

11.15am – 11.40am:

Paper 1

A Case Study and Characterization of a Many-socket, Multi-tier NUMA HPC Platform
Authors: Connor Imes, Steven Hofmeyr, Dong In D. Kang, John Paul Walters

11.40am – 12.05pm:

Paper 2

Introducing multi-level parallelism, at coarse, fine and instruction level to enhance the performance of iterative solvers for large sparse linear systems on Multi- and Many-core architecture
Authors: Jean-Marc Gratien

12.05pm – 12.30pm:

Paper 3

Using Hierarchical Parallelism to Accelerate the Solution of Many Small Partial Differential Equations
Authors: Jacob Merson, Mark S. Shephard

12.30pm – 12.55pm:

Paper 4

Flexible Runtime Reconfigurable Computing Overlay Architecture and Optimization for Dataflow Applications
Authors: Mihir Shah, Benjamin Carrion Schafer

12.55am – 1.45pm:

Keynote

Single-level Programming on Hierarchical Hardware via Adaptive Runtime? Maybe
Speaker: Laxmikant Kale (UIUC)

1.45pm – 2.30pm:

Break

2.30pm – 3.45pm:

Panel Session

Panelists: Irina Demeshko (LANL), Sadaf R. Alam (CSCS), Sunita Chandrasekaran (Univ Delaware), Peter Hofstee (IBM), Stephen Jones (NVIDIA)

Moderator: Christian Trott (Sandia Nationals Labs)

3.45pm – 4.00pm:

Break

4.00pm – 5.00pm:

Invited Talk

A Portable Asynchronous Tasking Approach to Hierarchical Parallelism – Successes, Challenges and Future Prospects

Speaker: Martin Berzins (Univ. of Utah)

5.00pm – 6.00pm:

Invited Talk

Glow: A Machine Learning Compiler and Execution Engine

Speaker: Jordan Fix (Facebook)

6.00pm – 6.00pm:

Concluding Remarks

WORKSHOP DETAILS

Scope and Aims

HiPar20 is designed to showcase new studies, approaches, and cutting-edge ideas on hierarchical parallelism for extreme-scale computing. Our goal is to highlight not just success stories but also discuss drawbacks and challenges.

We welcome papers and talks from the HPC community addressing the use of emerging architectures — focusing particularly on those characterized by fewer but more powerful nodes as well as systems with hierarchical networks, where the hierarchy is not just characterized by performance metrics, but tiered communication semantics. Specifically, the emphasis is on the design, implementation, and application of programming models for multi-level parallelism, including abstractions for hierarchical memory access, heterogeneity, multi-threading, vectorization, and energy efficiency, as well as scalability and performance studies thereof.

Of particular interest are models addressing these concerns portably: providing ease of programming and maintaining performance in the presence of varied accelerators, hardware configurations, and execution models. Studies that explore the merits of specific approaches to addressing these concerns, such as generic programming or domain specific languages, are also in scope.

The workshop is not limited to the traditional HPC software community. As one example, another key topic is the use of hierarchical parallelism in dealing with the challenges arising in machine learning, due to the growing importance of this field, the large scale of systems tackled in that area, and the increasing interest at SC.

Topics

Submissions are encouraged in, but not limited to the following areas:

Hierarchical work scheduling and execution;
Hardware, software and algorithmic advances for efficient use of memory hierarchies, multi-threading and vectorization;
Efficient use of nested parallelism, for example CUDA dynamic parallelism, for large scale simulations;
Programming heterogeneous nodes;
Leading edge programming models, for example fully distributed task-based models and hybrid MPI+X, with X representing shared memory parallelism via threads, vectorization, tasking or parallel loop constructs;
Implementations of algorithms that are natural fits for nested work (for example approaches that use recursion);
Challenges and successes in managing computing hierarchies;
Examples demonstrating effective use of the combination of inter-node and intra-node parallelism;
Novel approaches leveraging asynchronous execution to maximize efficiency;
Challenges and successes of porting of existing applications to many-core and heterogeneous platforms;
Recent developments in compiler optimizations for emerging architectures;
Applications from emerging AI fields, for example deep learning and extreme-scale data analytics.

Submission guidelines

Submissions are solicited in the following categories:

(a) Regular research papers:

Intended for submissions describing original work and ideas that have not appeared in another conference or journal, and are not currently under review for any other conference or journal.
Regular papers must be at least (6) and not exceed (10) letter size pages (U.S. letter – 8.5″x11″).
Accepted regular papers will be published in the workshop proceedings in cooperation with IEEE TCHPC (pending acceptance).

(b) Short papers:

Intended for submissions presenting novel, interesting ideas or preliminary results that will be formally submitted elsewhere.
Short papers must not exceed four (4) pages.
Short papers will NOT be included in the proceedings.

Please note that:

The page limits above only apply to the core text, content-related appendices, and figures.
References and reproducibility appendix do not count against the page limit.

When deciding between submissions with comparable evaluations, priority will be given to those with higher quality of presentation and whose focus relates more directly to the workshop themes.

Papers must be submitted at https://submissions.supercomputing.org and must follow the IEEE format: https://www.ieee.org/conferences/publishing/templates.html

Reproducibility / Transparency plan

HiPar20 follows the SC20 reproducibility and transparency initiative:
https://sc20.supercomputing.org/submit/transparency-reproducibility-initiative

HiPar20 requires all submission to include an Artifact Description (AD) Appendix.
The Artifact Evaluation (AE) remains optional.

We also encourage authors to follow the transparency initiative for two reasons:
(a) it helps the authors themselves with the actual writing and structuring of the paper to express the research process;
(b) it helps readers understand the thinking process used by the authors to plan, obtain and explain their results.

IMPORTANT DATES

ORGANIZATION

WORKSHOP CHAIR

Francesco Rizzi

NexGen Analytics

Organizing Committee

Daisy S. Hollman

Sandia National Labs

Xiaoye Sherry Li

Lawrence Berkeley National Lab

Lee Howes

Facebook

Program Committee CHAIRS

Christian Trott

Sandia National Labs

Filippo Spiga

NVIDIA

Program Committee

Mark Bull

EPCC

Carlo Cavazzoni

Leonardo

Benjamin Cumming

CSCS

Chris Forster

NVIDIA

Marta Garcia Gasulla

BSC

Anja Gerbes

Goethe Uni.Frankfurt

Mark Hoemmen

Stellar Science

Toshiyuki Imamura

RIKEN

Guido Juckeland

Helmholtz Center

Hartmut Kaiser

LSU

Vivek Kale

Brookhaven Labs

Jonathan Lifflander

Sandia National Labs

James Lin

Shanghai J.Tong Univ.

Aram Markosyan

Xilinx

Rui Oliveira

INESC TEC

Philippe Pébay

NexGen Analytics

Zhiqi Tao

Intel

Flavio Vella

Univ. of Bozen

Michèle Weiland

EPCC

Jeremiah Wilke

Sandia National Labs

HiPar

HiPar

workshop

hierarchical parallelism for exascale computing

HiPar20 welcomes HPC practitioners, from hardware and compiler experts to algorithms and software developers, to present and discuss the state of the art in emerging approaches to utilize multi-level parallelism for extreme scale computing.

SUMMARY

(Workshop date: Wednesday Nov. 11th --- 10am-6.30pm EST)

Welcome and Overview

Exploiting Hierarchical Algorithms on Ever More Hierarchical Architectures

break

A Case Study and Characterization of a Many-socket, Multi-tierNUMA HPC Platform

Introducing multi-level parallelism, at coarse, fine and instruction level to enhance the performance of iterative solvers for large sparse linear systems on Multi- and Many-core architecture

Using Hierarchical Parallelism to Accelerate the Solution of Many Small Partial Differential Equations

Flexible Runtime Reconfigurable Computing Overlay Architecture and Optimization for Dataflow Applications

Single-level Programming on Hierarchical Hardware via Adaptive Runtime? Maybe

break

Panel Session

break

A Portable Asynchronous Tasking Approach to Hierarchical Parallelism - Successes, Challenges and Future Prospects

Glow: A Machine Learning Compiler and Execution Engine

Concluding Remarks

10.00am – 10.05am (EST):

Welcome and Overview

10.05am – 11.00am:

Keynote

11.00am – 11.15am:

Break

11.15am – 11.40am:

Paper 1

11.40am – 12.05pm:

Paper 2

12.05pm – 12.30pm:

Paper 3

12.30pm – 12.55pm:

Paper 4

12.55am – 1.45pm:

Keynote

1.45pm – 2.30pm:

Break

2.30pm – 3.45pm:

Panel Session

3.45pm – 4.00pm:

Break

4.00pm – 5.00pm:

Invited Talk

5.00pm – 6.00pm:

Invited Talk

6.00pm – 6.00pm:

Concluding Remarks

IMPORTANT DATES

ORGANIZATION

Francesco Rizzi

Daisy S. Hollman

Xiaoye Sherry Li

Lee Howes

Christian Trott

Filippo Spiga

Mark Bull

Carlo Cavazzoni

Benjamin Cumming

Chris Forster

Marta Garcia Gasulla

Anja Gerbes

Mark Hoemmen

Toshiyuki Imamura

Guido Juckeland

Hartmut Kaiser

Vivek Kale

Jonathan Lifflander

James Lin

Aram Markosyan

Rui Oliveira

Philippe Pébay

Zhiqi Tao

Flavio Vella

Michèle Weiland

Jeremiah Wilke

hierarchical parallelism
for exascale computing

A Case Study and Characterization of a Many-socket, Multi-tier
NUMA HPC Platform