Manager, AI Operations & Evaluation
Company: Chime
Location: San Francisco, CA (Remote)
Salary: $150k - $208k per year
Type: Full-time
Level: Senior
Remote: Yes
Posted: 2026-02-20
About this role
About The Role
AI Operations (AIOPS) defines how AI is governed, evaluated, and continuously improved across OMX. We ensure every model in Operations is accurate, fair, and aligned with Chime’s standards for operational excellence and member trust.
As Manager, AI Evaluation & Insights, you’ll lead the team responsible for operationalizing and executing AI evaluation standards across OMX. You’ll run human and automated evaluation systems, manage model health monitoring, and apply testing and simulation frameworks that detect hallucinations, bias, or drift before they impact members or agents.
You’ll manage a team of TPM’s and evaluation specialists who measure AI performance across risk, compliance, agent experience, and bot experience domains. You’ll ensure AI deployments meet the standards set by the AI Governance pillar and deliver measurable value to Operations.
The base salary offered for this role and level of experience will begin at $150,000.00 and up to $208,000.00. Full-time employees are also eligible for a bonus, competitive equity package, and benefits. The actual base salary offered may be higher, depending on your location, skills, qualifications, and experience.
In This Role, You Will
- Lead the AI Evaluation team, owning staffing, coaching, performance management, and delivery of evaluation and testing frameworks.
- Manage the AI evaluation lifecycle — including pre-launch testing, simulation, and post-deployment health monitoring — ensuring alignment with governance standards and expectations.
- Create domain-specific evaluation tracks (e.g., Compliance & Risk, Bot Experience, Agent Experience) to assess AI quality from multiple perspectives.
- Operationalize human-in-the-loop testing, integrating reviewer feedback into continuous improvement loops.
- Oversee simulation environments (3rd-party tools) for stress-testing LLMs and identifying hallucinations or performance regressions.
- Partner closely with AI Platform ...