Data Engineer
Company: FalsiFind
Location: Location not specified (Remote)
Type: Full-time
Remote: Yes
Posted: 2026-04-22
About this role
Deepfakes are coming for the financial system.
At FalsiFind, we secure financial institutions against deepfake impersonation by authenticating voice, video, images, text, and identities at scale. We help banks, credit unions, and fintechs protect assets, ensure compliance, and preserve trust in the digital economy.
We're building the data infrastructure that powers our detection platform and looking for a Data Engineer who can design, build, and maintain the pipelines that feed our models. From raw ingestion to structured, production-ready datasets.
What we need you to do:
→ Design and maintain scalable data pipelines for audio, video, image, and text modalities across ingestion, processing, and storage layers
→ Build and manage synthetic data pipelines to support model training across FalsiFind's four detection modalities
→ Own data quality, lineage, and versioning across training, evaluation, and production datasets
→ Collaborate with ML engineers to define data schemas, feature stores, and labeling workflows
→ Instrument pipelines for monitoring, alerting, and performance observability
→ Ensure data handling practices meet FI-grade compliance and security requirements (SOC 2, encryption at rest and in transit, access controls)
Ideal background:
→ 3+ years building and maintaining production data pipelines (Airflow, Prefect, dbt, Spark, or similar)
→ Strong experience with cloud data infrastructure (AWS, GCP, or Azure) and object storage at scale
→ Comfort working with multimodal data — audio, video, and image formats alongside structured/tabular data
→ Familiarity with ML workflows: training data versioning, feature pipelines, dataset registries
→ Experience operating in regulated environments or with security-conscious data governance is a strong plus
U.S. citizens or green card holders only.