Senior Platform Data Engineer
Company: Geisinger
Location: Remote (Remote)
Type: Full-time
Remote: Yes
Posted: 2026-04-16
About this role
Location Work from Home Job Category Business Strategy and Innovations Schedule Days Work Type Full time Department IT Data Management Division Date posted 04/16/2026 Job ID R-95109
### Job Summary
The Senior Platform Data Engineer owns roadmap, priorities, platform standards, and architecture reviews; provides formal input on performance reviews. This position makes clinical data ready for AI at scale: owning the shared data products, retrieval infrastructure, and platform administration that the entire AI portfolio depends on. Owns Real-time data feeds. Reusable clinical data models and feature pipelines. RAG retrieval infrastructure (ingestion, chunking, embeddings, vector DB, retrieval pipelines). Databricks platform administration.
### Job Duties
- Streams data from Epic SDE, ADT feeds, lab results, and other clinical sources into Databricks for downstream model consumption.
- Curates shared clinical feature tables (patient demographics, labs, vitals, diagnoses, utilization history, imaging metadata) in Databricks/Unity Catalog that multiple AI programs consume for model training, validation, and monitoring.
- Owns RAG Infrastructure, the shared retrieval-augmented generation platform that agentic and generative AI programs use to ground LLM outputs in organizational knowledge.
- Designs and operates document ingestion pipelines: normalizing clinical documents, policies, guidelines, and unstructured data sources into formats ready for embedding and retrieval.
- Implements and optimizes chunking strategies tailored to healthcare content (e.g., preserving clinical note structure, section-aware chunking for guidelines and protocols).
- Manages the embedding pipeline: selecting, tuning, and versioning embedding models (domain-specific clinical models where they outperform general-purpose).
- Administers the vector database: schema design, indexing, metadata management, access controls, and performance tuning.
- Builds and mai...