Senior Software Engineer, CUDA Deep Learning Systems
Company: NVIDIA
Location: Austin, TX 78717
Salary: $184,000 - $356,500 a year
Type: Full-time
Posted: 2026-05-14
About this role
We are looking for an experienced and highly motivated software professional to work on pioneering initiatives and projects at the intersection of CUDA and Deep Learning Systems. As the complexity and scale of artificial intelligence continue to grow, the intersection of advanced deep learning architectures, massive-scale distributed computing, and low-level hardware optimization has never been more critical. Our team is dedicated to exploring and prototyping next-generation ideas that bridge the gap between deep learning algorithms and CUDA, pushing the boundaries of what is possible on modern accelerator architectures.
Join our dynamic, research-oriented team to help unlock maximum hardware performance for emerging AI workloads. You will be a crucial member of a highly technical group exploring uncharted territories in model optimization, custom kernel development, and cluster-scale AI systems design. If you are passionate about the fundamentals of deep learning and thrive on squeezing every ounce of performance out of advanced computing systems from a single GPU to supercomputer clusters, we want you on our team!
What you will be doing:
- Explore, research, and prototype novel systems optimizations for advanced deep learning models at the intersection of high-level DL frameworks and low-level CUDA through modeling, simulation, and silicon prototyping.
- Architect and optimize distributed computing systems that scale seamlessly from a single node to massive, cluster-scale supercomputing environments.
- Design, implement, and optimize custom high-performance CUDA kernels tailored to emerging neural network architectures and workloads.
- Analyze complex hardware-software interactions to identify and resolve performance bottlenecks in both training and inference pipelines.
- Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts to co-design systems and algorithms that improve accelerator compute util...