Platform Engineer (AI Infrastructure) at WNTD

We are hiring a Platform Engineer to help build and evolve the software platform behind large scale AI infrastructure.

This is a hands on engineering role for someone who can write strong Python, work deeply with Kubernetes, design and build platform applications, and operate close to bare metal infrastructure.

You will help build the systems that make GPU compute easier to provision, operate, secure and scale across AI infrastructure environments.

This is not a generic DevOps role. We are not looking for someone who has only maintained pipelines, written Terraform or managed cloud services. We need someone who can build real platform software and understands the infrastructure it runs on.

What you will do

Design and build platform applications, APIs and services

Write production grade Python for infrastructure and platform use cases

Work with Kubernetes to build scalable platform capabilities

Design and build Kubernetes operators and controllers across compute, storage and networking

Build tooling that improves how bare metal and GPU infrastructure is provisioned, operated and monitored

Translate operational pain points into scalable platform features

Improve platform reliability, observability and performance

Work across Linux, networking, storage and distributed systems

Collaborate with product, security, infrastructure, networking and compute teams

Help build the platform layer for AI infrastructure designed to operate at industrial scale

What we are looking for

Strong Python engineering experience

Strong hands on Kubernetes experience

Experience designing and building applications, APIs, services or internal platform tooling

Bare metal infrastructure experience

Strong Linux systems experience

Good understanding of networking, storage and distributed systems

Experience building production grade systems with proper testing, CI/CD, code reviews and clean engineering standards

A practical engineering mindset and the ability to solve real infrastructure problems through software

Preferred experience

Experience building Kubernetes operators, CRDs or controllers

Exposure to GPU infrastructure, HPC or high performance compute

Experience with Go or Rust

Knowledge of confidential computing, including TEE, SEV, TDX or CoCo

Experience with Ceph or distributed storage systems

Familiarity with Prometheus, Grafana or OpenTelemetry

Experience with BGP, RDMA or high performance networking

Exposure to NVIDIA GPU infrastructure or bare metal cloud environments

Why this role matters

AI infrastructure is constrained by the ability to deliver reliable compute at scale. This role sits in the platform layer that connects software engineering with real infrastructure.

You will help build systems that run close to the metal, across Kubernetes, Linux, networking, storage and GPU compute.

This is a role for someone who wants to build the infrastructure layer behind AI, not just operate tools around it.