Machine Learning Data Engineer
Company: Kalibri Labs
Location: Washington, DC (Remote)
Salary: $120,000 - $160,000 a year
Type: Full-time
Remote: Yes
Posted: 2026-04-07
About this role
*At Kalibri, we are helping to redefine and rebuild the hotel industry. We are looking for passionate, energetic, and hardworking people with an entrepreneurial spirit, who dream big and challenge the status quo. We are working on cutting-edge solutions for the industry: we harness cloud-native data pipelines with advanced AI/ML models to drive asset performance. Kalibri is growing, so if you’re ready to make a difference and utilize your talents across a groundbreaking organization, please keep reading!*
### About the Role
We're looking for a Machine Learning Data Engineer who can build, maintain, and improve the production pipelines that power Kalibri's core algorithmic products — including Census, Prediction, Estimation, and OBM. This role is ideal for someone mid-level who thrives turning complex models into reliable, scalable production systems.
This role will design, build, and maintain production data pipelines. You'll collaborate with Data Science, ML Engineering, Data Operations, and Product on transforming raw hospitality data into production-grade ML features and outputs.
This is a great opportunity to engineer the data backbone of AI-powered hospitality, working with big data and machine learning to help increase asset values.
### Responsibilities
- Design, build, and maintain production data pipelines using Python, Prefect, Airflow, Jenkins or any other orchestration framework multi-phase algorithmic workflows.
- Build and optimize advanced SQL transformations in Snowflake, including window functions, CTEs, stored procedures, UDFs, and semi-structured data processing.
- Build and maintain dbt models for data transformation, identity resolution, and slowly changing dimension (SCD Type 2) tracking across 80+ models and multiple pipeline stages.
- Build and maintain feature engineering pipelines that feed ML models including CatBoost gradient boosting, Prophet time-series decomposition, LightGBM regression, and PuLP linear programming solvers...