PSSG at ACM/IEEE Supercomputing 2025

2 minute read

PSSG will have a strong presence at SC ‘25, the premier international conference for high-performance computing, networking, storage, and analysis. We are excited to be contributing in several impactful ways.

Technical Paper Presentations

Aditya Ranjan (graduated MS student) and team will be presenting their work on Plexus: a highly scalable GNN training framework that achieves unprecedented speedups of 2.3-12.5X over prior state of the art, and a reduction in time-to-solution by 5.2-8.7X on NERSC’s Perlmutter (scaled up to 2,048 GPUs) and 7.0-54.2X on Frontier (scaled up to 1,024 GPUs). [Talk Link]

Posters

Prajwal Singhania (third-year PhD student) and Lannie Dalton Hough (second-year PhD student) will be presenting their poster titled “Understanding Communication Bottlenecks in Multi-Node LLM Inference”, about investigating the scaling bottlenecks in multi-node LLM inference using YALIS, our prototype inference engine.

Onur Cankur (final-year PhD student) will be presenting his poster titled “Understanding GPU Utilization Using LDMS Data on Perlmutter” about analyzing spatial and temporal trends of workloads on Perlmutter to reveal inefficiencies and imbalances that can guide workload optimization.

Cunyang Wei (second-year PhD student) will be presenting his poster titled “Unmasking Performance Variability in GPU Codes on Supercomputers”. The poster is about evaluating and predicting performance variability on GPU supercomputing systems (Perlmutter and Frontier). It also provides several recommendations for system administrators and users to mitigate performance variability.

Keshav Pradeep (senior undergraduate student) will be presenting a poster titled “Optimizing Collectives with Large Payloads on GPU-based Supercomputers.” The poster evaluates the performance of existing collective libraries (RCCL and Cray-MPICH) on Frontier, identifying key bottlenecks that limit scalability in large model training. It introduces PCCL, which scales efficiently to 2048 GPUs, achieving up to 33× faster all-gather operations and nearly 5× end-to-end training speedups.

Tutorials

The AxoNN and YALIS team (Prajwal Singhania, Lannie Dalton Hough, Cunyang Wei) will present a tutorial on AxoNN, a highly scalable and easy-to-use parallel framework for AI training. Attendees of the tutorial will learn how to use state-of-the-art parallel algorithms in AxoNN to parallelize their GPU training workloads with minimal code changes. We will demonstrate this using a very popular use case—finetuning open-source LLMs like Llama-3 on instruction finetuning data. We will also dive into the basics of LLM inference with YALIS and vLLM.

UMD Booth Talks

PhD students — Joy Kitson, Josh Davis, Prajwal Singhania, Cunyang Wei, Lannie Dalton Hough, Emir Gencer, and Klaudiusz Rydzy — will be presenting their research work at the UMD Exhibitor Booth at SC25. A schedule of the talks can be found below.

Tuesday, November 18
- 12:10 p.m. “Understanding and Mitigating Communication Bottlenecks in Multi-Node LLM Inference” – Prajwal Singhania
- 1:10 p.m. “Unmasking Performance Variability in GPU Codes on Production Supercomputers” – Cunyang Wei
- 3:10 p.m. “Layer-Aware Asymmetric Tensor Parallelism for Heterogeneous LLM Inference” – Lannie Dalton Hough
Wednesday, November 19
- 10:10 a.m. “Understanding GPU Utilization Using LDMS Data on Perlmutter” – Onur Cankur
- 12:10 p.m. “Modeling Interdependent Social and Biological Contagions on Massive Multi-layer Networks” – Joy Kitson
- 1:10 p.m. “Can LLMs Explain GPU Kernel Performance?” – Joshua H. Davis
Thursday, November 20
- 10:10 a.m. “ucTrace: Monitoring MPI Communication with UCX Tracing” – Emir Gencer
- 12:10 p.m. “Can Agentic Frameworks Identify and Fix Performance Issues” – Klaudiusz Rydzy

We look forward to connecting with colleagues and collaborators at SC ‘25. Stay tuned for more updates as the conference approaches!

Share on

Twitter Facebook LinkedIn

UMD HPC Group

PSSG at ACM/IEEE Supercomputing 2025

Technical Paper Presentations

Posters

Tutorials

UMD Booth Talks

Share on

You may also enjoy

Beyond NCCL: Faster Inter-Node All-Reduce for Decode-Heavy LLM Inference

PSSG at ACM/IEEE Supercomputing 2024

ParEval Leaderboard: Evaluating the Ability of Large Language Models to Generate Parallel Code