Senior Software Engineer, Data
Company: Ai2
Location: Seattle, WA
Type: Full-time
Posted: 2026-05-02
About this role
Persons in these roles are expected to work from our offices in Seattle. On-site requirements vary based on position and team. If you have questions about on-site work arrangements for this role, please ask your recruiter.
Our
base
salary range is $126,000 - $189,000, and in addition we have generous bonus plans to provide a competitive compensation package.
Who You Are:
The Allen Institute for AI (Ai2) is hiring a Senior Data Engineer to build the data infrastructure behind AI research agents that explore and reason over scholarly literature. You'll work on the Semantic Scholar corpus, expanding what it covers and improving the quality of what's already there, and create the APIs and tooling that these agents rely on at scale.
This role sits at the intersection of data engineering and applied ML. You'll own pipelines, design schemas, and ship production services, but you'll also apply practical ML techniques (entity resolution, text classification, embedding-based similarity) to improve data quality and enrich metadata at scale, directly shaping what the agents can do. We're looking for a strong engineer who is comfortable across that full range.
Who We Are:
The Agentic Applications team builds open, production-grade systems that power scientific discovery and large-scale AI research. We focus on creating high-quality structured datasets, integrating diverse content types, and enabling downstream applications across search, citation analysis, and model training. The team combines strong engineering practices with close collaboration across Ai2's product and research orgs to deliver tools and infrastructure used by millions of researchers and developers worldwide.
Your Next Challenge:
- Improve the coverage and quality of the Semantic Scholar corpus across academic papers, patents, and new domain-specific datasets
- Build and maintain scalable data pipelines for corpus integration, citation resolution, and metadata enri...