Homepage: https://www.usenix.org/conference/nsdi23
Paper list: https://www.usenix.org/conference/nsdi23/technical-sessions
- Spring: https://www.usenix.org/conference/nsdi23/spring-accepted-papers
- Fall: https://www.usenix.org/conference/nsdi23/fall-accepted-papers
- Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs [Paper] [Code]
- UCLA & CMU & MSR & Princeton
- Resilient distributed training
- Shepherd: Serving DNNs in the wild [Paper] [Personal Notes]
- UWaterloo & Yale & UC Berkeley
- Handle the short-term workload unpredictability.
- Aggregate request streams into moderately-sized groups; leverage preemption and model-specific batching.
- Understanding RDMA Microarchitecture Resources for Performance Isolation [Personal Notes] [Paper] [Benchmark Suite]
- Duke & Microsoft & SJTU
- Develop a test suite to evaluate RDMA performance isolation solutions.