Senior DevOps Engineer/Site Reliability Engineer
Company: Stellar Cyber
Location: Location not specified (Remote)
Type: Full-time
Remote: Yes
Posted: 2026-06-02
About this role
Join a fast-growing global leader in cybersecurity, trusted by some of the biggest names in the industry. In addition to some of the world’s largest enterprises and government agencies, more than 30% of the world’s top MSSPs rely on our platform. We’re at the forefront of protecting organizations against sophisticated cyber threats using cutting-edge AI and automation technologies. Our culture is built on diversity, openness, and collaboration, fostering creativity and innovation that drives real impact in the market..
We are seeking a highly skilled Senior DevOps / Site Reliability Engineer (SRE) to join our globally distributed engineering organization. This is a hands-on senior-level role focused on building, operating, and scaling reliable cloud-native infrastructure and distributed data platforms.
The ideal candidate will have strong expertise in Kubernetes, cloud infrastructure, observability, automation, CI/CD, incident management, and infrastructure reliability. This role combines DevOps engineering practices with SRE principles to improve scalability, resiliency, operational efficiency, and platform performance across production environments.
The engineer will work closely with platform, development, and operations teams to drive automation, operational excellence, and reliability best practices for mission-critical systems.
Key Responsibilities
- Administer and maintain Kubernetes clusters and containerized workloads.
- Manage cloud infrastructure across OCI, AWS, GCP, or Azure environments.
- Develop and maintain CI/CD pipelines for reliable application deployments.
- Implement and manage Infrastructure as Code (IaC) using Terraform and Helm.
- Build automation tooling and operational workflows using Python, Go, or Bash.
- Drive observability initiatives including monitoring, logging, tracing, and alerting improvements.
- Monitor, troubleshoot, and resolve production incidents while participating in on-call rotations.
- Support and optimize distr...