2nd workshop on

hierarchical parallelism
for exascale computing

Sunday,  14th November 2021 – 9am – 4.45pm CST

In cooperation with:

Held in conjunction with:

The International Conference for High Performance Computing, Networking, Storage and Analysis
The International Conference for High Performance Computing, Networking, Storage and Analysis

HiPar21 welcomes HPC practitioners, from hardware and compiler experts to algorithms and software developers, to present and discuss new studies, approaches and cutting-edge ideas to utilize multi-level parallelism for extreme scale computing.

SUMMARY

 

High-performance computing (HPC) platforms are evolving towards having fewer but more powerful nodes, driven by the increasing number of physical cores in multiple sockets and accelerators. The boundary between nodes and networks is starting to blur, with some nodes now containing tens of compute elements and memory sub-systems connected via a memory fabric.

The immediate consequence is an increasing complexity, due to ever more complex architecture (e.g., memory hierarchies), novel accelerator designs, and energy constraints.

Spurred largely by this trend, hierarchical parallelism is increasingly gaining momentum. This approach embraces the intrinsic complexity of current and future HPC systems, rather than avoiding it, by exploiting parallelism at all levels: compute, memory and network.

This workshop focuses on hierarchical parallelism. It aims at bringing together application, hardware, and software practitioners proposing new strategies to fully exploit computational hierarchies, and examples to illustrate their benefits to achieve extreme scale parallelism.

WORKSHOP PROGRAM

(Workshop date: Sunday Nov. 14th --- 9am-4.45pm CST)

9.00AM – 9.05am (CST)

Welcome and Overview

9.05AM – 10.00am / INvited speaker

reasoning about software correctness

Speaker: Sean Parent (Sr. Principal Scientist, Adobe) 

10.00AM – 10.30am

break

10.30AM – 11.00am / PAPER 1 – Slides

Distributing Higher-Dimensional Simulations Across Compute Systems: A Widely Distributed Combination Technique

Authors: Theresa Pollinger (Univ. Stuttgart), Marcel Hurler (Univ. Stuttgart),
Michael Obersteiner (TUM), Dirk Pfluger (Univ. Stuttgart) 

11.00AM – 11.30AM / PAPER 2 – SLIDES

Benchmarking and Extending SYCL Hierarchical Parallelism

Authors: Tom Deakin (Univ. Bristol), Aksel Alpay (Heidelbeg Univ.),
Simon McIntosh-Smith (Univ.Bristol), Vincent Heuveline (Heidelbeg Univ.)

11.30AM – 12.00PM / PAPER 3

Accelerate MMEwald on a New Generation of Sunway Supercomputer

Authors: Mingchuan Wu, Yangjun Wu, Honghui Shang, Ying Liu, Huimin Cui, Xiaobing Feng 
(Institute of Computing Technology, Chinese Academy of Sciences)

 

12.00PM – 12.30PM / PAPER 4 – SLIDES

Did the GPU obfuscate the load imbalance in my MPI simulation?

Authors: David Eberius (ORNL), David Boehme (LLNL), Olga Pearce (LLNL)

12.30PM – 2.00Pm

LUNCH break

2.00PM – 3.00PM

Panel Session

Panelists: Elliott Slaughter (SLAC), Simon McIntosh-Smith (Univ. Bristol),
Paul Kelly (Imperial College London), Michela Taufer (Univ. of Tennessee)
Moderator:
 Lee Howes (Facebook)

3.00PM – 3.30Pm

break

3.30PM – 4.00PM / paper 5 – SLIDES

Uintah+Hedgehog: Combining Parallelism Models for End-to-End Large-Scale Simulation Performance

Authors: J. Holmen, D. Sahasrabudhe, M. Berzins (Univ. Utah),
A. Bardakoff, T. Blattner, W. Keyrouz (NIST)

 

4.00PM – 4.30PM / paper 6 – SLIDES

PPIR: Parallel Pattern Intermediate Representation

Authors: A. Schmitz, J. Miller, L. Trumper, M. Muller (RWTH Aachen Univ.)

4.30PM – 4.45pm 

Concluding Remarks

WORKSHOP DETAILS

HiPar21 is designed to showcase new studies, approaches, and cutting-edge ideas on hierarchical parallelism for extreme-scale computing. Our goal is to highlight not just success stories but also discuss drawbacks and challenges.

We welcome contributions from the HPC community addressing the use of emerging architectures — focusing particularly on those characterized by fewer but more powerful nodes as well as systems with hierarchical networks, where the hierarchy is not just characterized by performance metrics, but tiered communication semantics. Specifically, the emphasis is on the design, implementation, and application of programming models for multi-level parallelism, including abstractions for hierarchical memory access, heterogeneity, multi-threading, vectorization, and energy efficiency, as well as scalability and performance studies thereof.

Of particular interest are models addressing these concerns portably: providing ease of programming and maintaining performance in the presence of varied accelerators, hardware configurations, and execution models. Studies that explore the merits of specific approaches to addressing these concerns, such as generic programming or domain specific languages, are also in scope.

The workshop is not limited to the traditional HPC software community. As one example, another key topic is the use of hierarchical parallelism in dealing with the challenges arising in machine learning, due to the growing importance of this field, the large scale of systems tackled in that area, and the increasing interest at SC.

Submissions are encouraged in, but not limited to the following areas:

  • Hardware, software and algorithmic advances for efficient use of memory hierarchies, multi-threading and vectorization;
  • Efficient use of nested parallelism, for example CUDA dynamic parallelism, for large scale simulations;
  • Hierarchical work scheduling and execution;
  • Programming heterogeneous nodes;
  • Leading edge programming models, for example fully distributed task-based models and hybrid MPI+X, with X representing shared memory parallelism via threads, vectorization, tasking or parallel loop constructs;
  • Implementations of algorithms that are natural fits for nested work (for example approaches that use recursion);
  • Challenges and successes in managing computing hierarchies;
  • Examples demonstrating effective use of the combination of inter-node and intra-node parallelism;
  • Novel approaches leveraging asynchronous execution to maximize efficiency;
  • Challenges and successes of porting of existing applications to many-core and heterogeneous platforms;
  • Recent developments in compiler optimizations for emerging architectures;
  • Applications from emerging AI fields, for example deep learning and extreme-scale data analytics.

Papers submissions are solicited in the following categories:

(a) Regular research papers:

Intended to describe original work and ideas that have not appeared in another conference or journal, and are not currently under review for any other conference or journal.
Regular papers must be at least (6) and not exceed (10) letter size pages (U.S. letter – 8.5″x11″).
Accepted regular papers will be published in the workshop proceedings in cooperation with IEEE TCHPC (pending acceptance).


(b) Short papers:

Intended to present novel, interesting ideas or preliminary results that will be formally submitted elsewhere.
Short papers must not exceed four (4) pages.
Short papers will NOT be included in the proceedings.
 

Please note:

Papers must follow the IEEE format: https://www.ieee.org/conferences/publishing/templates.html.
The page limits above only apply to the core text, content-related appendices, and figures.
References and reproducibility appendix do not count against the page limit.

When deciding between submissions with comparable evaluations, priority will be given to those with higher quality of presentation and whose focus relates more directly to the workshop themes.

Papers must be submitted at https://submissions.supercomputing.org

Are you working on code/algorithms and would like to see if and how it can benefit from a hierarchical approach?
Or exploring state-of-the-art hierarchical approaches and programming models, and desire feedback about how to approach such problems in practice?
This new idea we are proposing this year might be of interest to you!

We invite practitioners at all levels (we encourage participation from junior ones) to submit a one-page summary describing an algorithm of their interest that is NOT already exploiting hierarchical parallelism and they would like to improve/change.
We will select a subset of these submissions and, on the workshop day, we will host parallel breakout sessions moderated by experts in the field, to guide the brainstorming discussions on if/how one can exploit hierarchical parallelism to improve them.

The submission must be 1 page and should address (at a high-level) these sections:
     motivation/application, core algorithm, desired scale of execution, and current bottlenecks (if any).
The single page limit only applies to the core text. Figures, references and appendices do not count against the page limit.

Please note that:

  • We will prioritize submissions by junior people. We believe this would be most beneficial as a way of allowing more experienced engineers and researchers to share their experience and approaches to solving this sort of problems.
  • We expect each submission to present code/algorithms that are relevant to the person/group submitting it.
    For example, if you are a researcher working on CFD, we expect your submission to be related to your code and computational issues, not on a different group’s or commercial code.
  • We envision hosting two or three parallel sessions.
    Each session will be approximately 45/60 mins: 10/15 mins for the author to present the algorithm and motivation, followed by the discussion until the time ends.
    (we reserve to expand the number of sessions if we receive a substantial number of submissions/interest in this)

HiPar21 follows the SC21 reproducibility and transparency initiative:
https://sc21.supercomputing.org/submit/reproducibility-initiative

HiPar21 requires all submission to include an Artifact Description (AD) Appendix.
The Artifact Evaluation (AE) remains optional.

We also encourage authors to follow the transparency initiative for two reasons:
(a) it helps the authors themselves with the actual writing and structuring of the paper to express the research process;
(b) it helps readers understand the thinking process used by the authors to plan, obtain and explain their results.

IMPORTANT DATES

  • Submission Deadline (deadline extended!):
    September 8th, 2021 (AoE)
  • Author Notification:
    September 22, 2021
  • Camera Ready:
    October 4, 2021
  • Final Program:
    October 9, 2021
  • Workshop Date:
    Sunday, November 14th 2021

ORGANIZATION

WORKSHOP CHAIR
Francesco Rizzi
NexGen Analytics
Organizing Committee
Daisy S. Hollman

Google

Xiaoye Sherry Li

Lawrence Berkeley National Lab

Lee Howes

Facebook

Program Committee CHAIRS
Christian Trott

Sandia National Labs

Filippo Spiga

NVIDIA

Program Committee
Mark Bull

EPCC

Irina Demeshko

LANL

Marta Garcia Gasulla

BSC

Anja Gerbes

TU Dresden

Mark Hoemmen

Stellar Science

Toshiyuki Imamura

RIKEN

Guido Juckeland

Helmholtz Center

Hartmut Kaiser

LSU

Vivek Kale

Brookhaven Labs

Jonathan Lifflander

Sandia National Labs

James Lin

Shanghai J.Tong Univ.

Nicholas Malaya

AMD

Aram Markosyan

Facebook

Rui Oliveira

INESC TEC

Philippe Pébay

NexGen Analytics

Zhiqi Tao

Intel

Flavio Vella

Univ. of Bozen

Michèle Weiland

EPCC

Jeremiah Wilke

Google