Senior ML Platform Reliability & Infrastructure Engineer

Company: Holisticon Poland

Location: Location not specified (Remote)

Type: Full-time

Remote: Yes

Posted: 2026-05-13

About this role

Holisticon Connect is a division within NEXER GROUP - a custom software development company. We started in Poland in 2017 and are now a team of over 140 people. We have the opportunity to work with world-renowned brands from Scandinavia, the UK, and Western Europe. Our goal is to grow stronger, in competence rather than in numbers. If you like what we do, check out our offer, maybe we will have the pleasure of meeting you! 😊

We are looking for
a Senior ML Platform Reliability & Infrastructure Engineer
to join a highly advanced drug discovery platform team working at the intersection of machine learning, large-scale data systems, and computational science. The team builds the core infrastructure that enables AI-driven drug design — processing massive biological and chemical datasets and running large-scale model training on high-performance computing systems.

The mission is to transform cutting-edge research models into reliable, scalable, production-grade systems that directly support the discovery of new medicines. This is a highly impactful environment where engineering excellence meets real-world scientific innovation in the life sciences domain.

We offer a choice of employment form: B2B or Employment Contract

  • UoP: 23 000 – 25 000 PLN gross/month
  • B2B: 170 – 190 PLN net/hour + VAT

Responsibilities:

  • Profile and optimise inference latency and throughput for model-serving runtimes handling high-volume prediction requests behind a routing/gateway layer.
  • Design and implement comprehensive observability across the platform by adding distributed tracing, effective logging, Grafana dashboards, alerting policies, and SLO/SLI frameworks using Prometheus, Loki, and OpenTelemetry.
  • Harden Kubernetes workloads running on GKE by optimising GPU/CPU resource tuning and improving scaling of resources.
  • Improve the resilience of asynchronous job pipelines built on Argo Workflows, Daprpub/sub, and Redis, including retry strategies, dead-letter handli...

Create Your Job Alert

Other Senior Jobs

Other Jobs in Location not specified