ECS 289D (Fall 2025) — Datacenter Systems for LLMs

Quarter-long seminar on datacenter systems for LLM training & inference, based on research papers and a team project. The course will cover four topics: datacenter networking, host networking, LLM inference, and LLM training. The first two topics talk about the datacenter systems and components that are essential for LLM training and inference. The last two topics talk about the classic works and recent progress in LLM training and inference systems.

General Information

Schedule

Week Date Topic Paper(s) Notes
1 Thu 9/25 Course Introduction Introduction by Yang; slides
2 Tue 9/30 Datacenter networking The Tail at Scale
optional Attack of the Killer Microseconds
Presentation by Yang;
Paper Presentation Selection Due
Thu 10/02 A Scalable, Commodity Data Center Network Architecture
optional VL2: A Scalable and Flexible Data Center Network
3 Tue 10/07 Data Center TCP (DCTCP)
optional Swift: Delay is Simple and Effective for Congestion Control in the Datacenter
Project Membership and Topic Due
Thu 10/09 Design Guidelines for High Performance RDMA Systems
optional Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!
4 Tue 10/14 Host networking RDMA over Ethernet for Distributed AI Training at Meta Scale
optional An Extensible Software Transport Layer for GPU Networking
Thu 10/16 IX: A Protected Dataplane Operating System for High Throughput and Low Latency
optional Arrakis: The Operating System is the Control Plane
Project Proposal Due
5 Tue 10/21 Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads
optional Snap: a Microkernel Approach to Host Networking
Thu 10/23 Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms
optional MSCCL++: Rethinking GPU Communication Abstractions for Cutting-Edge AI Applications
6 Tue 10/28 LLM Inference Efficient Memory Management for Large Language Model Serving with PagedAttention
optional vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Thu 10/30 DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
optional Optimizing SLO-oriented LLM Serving with PD-Multiplexing
7 Tue 11/04 FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
optional XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Thu 11/06 NanoFlow: Towards Optimal Large Language Model Serving Throughput
optional Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Mid-Quarter Milestone Due
8 Tue 11/11 LLM Training Skipped for Veterans Day Holiday
Thu 11/13 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
optional FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
optional FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
9 Tue 11/18 ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
optional PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
optional Everything about Distributed Training and Efficient Finetuning
Thu 11/20 Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
optional DeepSeek Open Infra
10 Tue 11/25 Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints
optional Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Thu 11/27 Skipped for Thanksgiving Holiday
11 Tue 12/02 Wrap Up Final Q&A; In-class presentations
Thu 12/04 Project Presentations In-class presentations
Tue 12/09 Project Reports Due Final reports due

Coursework and Grading

Candidate Project Ideas

Last updated