Srikar Pisupati
Home
About
Blog
Contact
My Blog Posts
Paper Summaries
DeepSeek V3 Technical Report
LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Fast Inference from Transformers via Speculative Decoding
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
SGLang: Efficient Execution of Structured Language Model Programs
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture for Serving LLM Chatbot
NIRVANA: Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models
Projects
Read about FairShare!
Opinions
How should AI be used in the future?