Senior Network Reliability Engineer - DGX Cloud
Company: Nvidia
Location: US, CA, Santa Clara (Remote)
Salary: $136k - $224.2k per year
Type: Full-time
Remote: Yes
Posted: 2026-06-30
About this role
NVIDIA is looking for a Senior Network Reliability Engineer to support and maintain our cloud and datacenter network infrastructures. This network serves the needs across the whole software stack for NVIDIA, from Graphics Drivers to Autonomous Vehicles and Artificial Intelligence.
In this role, the Senior Network Operations Engineer will remediate critical alerts within defined SLAs, triage production impacting network incidents, and interact with internal customers on network related issues. They will also be responsible for engaging with external vendors to remediate hardware and software issues, and participate in project related work such as network device upgrades and capacity augmentations. An ideal candidate will possess a wide range of skills, including alert monitoring & resolution in large-scale networks and CSP environments, outstanding troubleshooting skills, understanding of L3 underlay networks, and network protocol knowledge in large multi-vendor infrastructures.
What you will be doing:
- Engage in 24/7 global shift rotations to provide remote support for network repairs and changes while collaborating across teams and updating customers on status and ticket information.
- Drive operational improvements in change management and daily operations by following procedures.
- Manage and operate large scale IP network technologies and infrastructures.
- Utilize your skills in Peering and Datacenter interconnect technologies: PNI, Transit, Exchange, Passive DWDM, Wave circuits.
- Monitor and support the network health of on-premises and cloud infrastructures.
- Collaborate and develop workflow enhancements while documenting best practices.
What we need to see:
- Deep knowledge and experience of TCP/IP, BGP, OSPF, MPLS, IS-IS, VxLAN, EVPN, QoS, GRE, IPsec, DNS, and MACsec.
- 5+ years of experience in network operations.
- Skilled in network troubleshooting techniques and demonstrating creative problem-solving abilities.
- Strong track record of a...