DevOps / Infrastructure Engineer
Company: mLabs
Location: United States (Remote)
Salary: $100,000 - $130,000 a year
Type: Full-time
Remote: Yes
Posted: 2026-06-01
About this role
Location: Remote - EST timezone
Remote | Full-time
Compensation: $100K - $130K
We are hiring on behalf of our client who is seeking an exceptional, production-proven Infrastructure & DevOps Engineer to take absolute ownership of the deployment, secure networking, architectural lifecycle, and overall reliability of this distributed agent fleet from day one. The client is engineering a sophisticated infrastructure designed to launch a highly distributed fleet of managed, single-tenant personal artificial intelligence (AI) trading agents. Operating non-stop, these isolated processes execute high-frequency, complex financial workflows natively on blockchain infrastructure, dedicated exclusively to individual user portfolios.
Key Responsibilities
- **Fleet Orchestration & Scaling:** Architect, provision, and scale the core user agent fleet across a hybrid Railway and AWS ecosystem, ensuring each user retains an isolated, secure, and predictable containerized process with optimized cost tracking and precise lifecycle hooks.
- **Secure Network Engineering:** Establish, manage, and continuously harden private overlay networks using Tailscale in production, linking disparate user agents securely with core Model Context Protocol (MCP) servers and the underlying live trading runtimes.
- **Automated User Provisioning:** Design and construct an end-to-end, zero-touch deployment pipeline utilizing advanced infrastructure-as-code and CI/CD best practices, enabling seamless, single-click automated provisioning of containers, secrets management, and environmental configurations for new users.
- **Operational Resilience & SRE:** Define, build, and maintain comprehensive monitoring, telemetry, alerting, and automated incident response frameworks to guarantee graceful state retention, preserving live in-flight transaction states across sudden host restarts, scheduled key rotations, or regional cloud outages.
- **Incident Management:** Oversee system health and part...