Senior Network Engineer - AI Infrastructure Operations

Company: Nscale

Location: Location not specified (Remote)

Type: Full-time

Level: lead

Remote: Yes

Posted: 2026-02-26

About this role

About Nscale
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.


We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future.


About The Role
Within Nscale, the Network Operations team is responsible for the performance and reliability of the high-speed interconnect fabrics that underpin our AI and HPC platforms. These networks are critical to distributed training and inference workloads and demand a deep operational focus.


We’re looking for a
Senior Network Engineer – AI Infrastructure
to join our Network Operations team.


In this role, you will be responsible for the day-to-day health, stability, and performance of Nscale’s large-scale Infiniband and RDMA over Converged Ethernet (RoCE) fabrics. You’ll bring deep operational expertise from high-performance or hyperscale environments and play a key role in incident response, performance tuning, and continuous improvement of latency-sensitive AI networking systems.


What You'll Be Doing

  • Owning the operational health, configuration consistency, and performance tuning of large-scale Infiniband and RoCE fabrics supporting AI and HPC workloads
  • Leading the diagnosis and resolution of complex network incidents (P0/P1), spanning firmware, kernel drivers, switch hardware, and app...

Create Your Job Alert

Other Senior Jobs

Other Jobs in Location not specified