I'm one of the core contributors to the CUTLASS project and am currently working at NVIDIA as a Sr. Architect.
CUTLASS is a collection of CUDA C++ templates for implementing high-performance matrix-multiplication (GEMM) and other frequently used computations in deep learning and high-performance computing. As a core contributor, I work on developing and optimizing these kernels to optimally leverage the capabilities of NVIDIA GPUs.
NVIDIA GPUs, CUDA, Parallel Computing Architecture, GPGPU, HPC, and parallel programming.
Feel free to reach out to me if you have any questions or want to collaborate on a project.
The profile : IonThruster
Significance : Even low thrust / acceleration sustained for months or years can result in great things.