-
"Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance". Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu. Proceedings of the 5th MLSys Conference, August 2022.
-
"Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance". Hiroyuki Ootomo, Rio Yokota. International Journal of High Performance Computing, March 2022.
-
"Arithmetic-intensity-guided fault tolerance for neural network inference on GPUs". Jack Kosaian, K. V. Rashmi. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2021.
-
"Real-time Neural Radiance Caching for Path Tracing". Thomas Muller, Fabrice Rousselle, Jan Novak, Alex Keller. ACM Trans. Graph., August 2021.
-
"Scalable Knowledge Graph Analytics at 136 Petaflop/s". Ramakrishnan Kannan, Piyush Sao, Hao Lu, Drahomira Herrmannova, Vijay Thakkar, Robert Patton, Richard Vuduc, Thomas Potok. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2020.
-
"Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity ". Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, Yuhao Zhu. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2020.
-
"Strassen's Algorithm Reloaded on GPUs". Jianyu Huang, Chenhan D. Yu, Robert A. van de Geijn. ACM Transactions on Mathematical Software, March 2020.