| 1 |
Thu 9/25 |
— |
Course Introduction |
Introduction by Yang: slides |
| 2 |
Tue 9/30 |
Datacenter networking |
The Tail at Scale
optional Attack of the Killer Microseconds
|
Presentation by Yang: slides
Paper Presentation Selection Due
|
| Thu 10/02 |
A Scalable, Commodity Data Center Network Architecture
optional VL2: A Scalable and Flexible Data Center Network
|
1st pre by John Drabek: slides
2nd pre by Yanfeng Ma: slides
|
| 3 |
Tue 10/07 |
Data Center TCP (DCTCP)
optional Swift: Delay is Simple and Effective for Congestion Control in the Datacenter
|
Project Membership and Topic Due
1st pre by Shreyas Shah: slides
2nd pre by Kaiyue Li: slides
|
| Thu 10/09 |
Design Guidelines for High Performance RDMA Systems
optional Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!
|
1st pre by Harish Krishnakumar:slides
2nd pre by Dalton Davis: slides
|
| 4 |
Tue 10/14 |
Host networking |
RDMA over Ethernet for Distributed AI Training at Meta Scale
optional An Extensible Software Transport Layer for GPU Networking
|
1st pre by Yang Zhou: slides
2nd pre by Lekhit Borole: slides
|
| Thu 10/16 |
IX: A Protected Dataplane Operating System for High Throughput and Low Latency
optional Arrakis: The Operating System is the Control Plane
|
Project Proposal Due
1st pre by Yang Zhou: slides
2nd pre by Yang Zhou: slides
|
| 5 |
Tue 10/21 |
Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads
optional Snap: a Microkernel Approach to Host Networking
|
1st pre by Neha Pradeep: slides
2nd pre by Qianqian Tan: slides
|
| Thu 10/23 |
Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms
optional MSCCL++: Rethinking GPU Communication Abstractions for Cutting-Edge AI Applications
|
1st pre by Aman Dwivedi: slides
2nd pre by Yang Zhou: slides
|
| 6 |
Tue 10/28 |
LLM Inference |
Efficient Memory Management for Large Language Model Serving with PagedAttention
optional vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
|
1st pre by Yuankai Li: slides
2nd pre by Alexander Dsouza: slides
|
| Thu 10/30 |
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
optional Optimizing SLO-oriented LLM Serving with PD-Multiplexing
|
1st pre by Yang Zhou
2nd pre by Yang Zhou
|
| 7 |
Tue 11/04 |
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
optional XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
|
1st pre by Zhuoli Huang
2nd pre by Yang Zhou
|
| Thu 11/06 |
NanoFlow: Towards Optimal Large Language Model Serving Throughput
optional Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
|
Mid-Quarter Milestone Due
1st pre by Yiqiao Lin
2nd pre by Sakthi Karimanal
|
| 8 |
Tue 11/11 |
LLM Training |
— |
Skipped for Veterans Day Holiday |
| Thu 11/13 |
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
optional FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
optional FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
|
1st pre by Ansha Prashanth
2nd pre by Shuang Ma
|
| 9 |
Tue 11/18 |
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
optional PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
optional Everything about Distributed Training and Efficient Finetuning
|
1st pre by Hemang Singh
2nd pre by Vrushali Harane
|
| Thu 11/20 |
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
optional DeepSeek Open Infra
|
1st pre by Nathan Kotni
2nd pre by Sudarsan Srivathsun
|
| 10 |
Tue 11/25 |
Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints
optional Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
|
1st pre by Yang Zhou
2nd pre by Yihan Zhang
|
| Thu 11/27 |
— |
Skipped for Thanksgiving Holiday |
| 11 |
Tue 12/02 |
— |
Wrap Up |
Final Q&A
In-class presentations
|
| Thu 12/04 |
— |
Project Presentations |
In-class presentations |
| — |
Tue 12/09 |
— |
Project Reports Due |
Final reports due |