Site Reliability Engineer (SRE)

J Bandy Consulting Limited

London

Permanent

Remote

£70,000 - £85,000/year

Kubernetes & Cloud Infrastructure (AWS/GCP)Golang for Automation & ToolingSRE Best Practices & Observability

This job posting has expired and is no longer accepting applications.

About The Role

We are seeking a talented and motivated Site Reliability Engineer (SRE) to join our client's dynamic, fast-paced team. In this role, you will be instrumental in designing, building, and maintaining the scalable, reliable, and high-performance infrastructure that powers our services. You will work at the intersection of software engineering and infrastructure operations, applying a software engineering mindset to system administration topics. Coming from a start-up background, you'll thrive in our agile environment and play a key part in shaping our platform's future.

Key Responsibilities

Design, build, and maintain our core infrastructure on AWS and GCP using Infrastructure as Code (IaC) principles with Terraform.
Develop and manage our Kubernetes clusters, focusing on automation, observability, and scalability to support our microservices architecture.
Write and maintain software in Golang to automate operational tasks, improve system reliability, and build tooling for the engineering team.
Participate in an on-call rotation to respond to production incidents, leading blameless post-mortems to drive continuous improvement.
Champion SRE best practices across the engineering organisation, including SLOs/SLIs, error budgets, and proactive monitoring and alerting.

Required Skills & Experience

Proven experience as a Site Reliability Engineer, DevOps Engineer, or a similar role.
Strong proficiency in Golang for automation and building internal tools.
Extensive hands-on experience with Kubernetes for container orchestration in a production environment.
Demonstrable experience managing cloud infrastructure (AWS and/or GCP) using Terraform.
Previous experience working within a start-up or a similarly fast-paced, agile environment.
Solid understanding of CI/CD pipelines, monitoring, and observability principles.

Nice-to-Have

Experience with other programming languages such as Python or Rust.
Familiarity with service mesh technologies like Istio or Linkerd.
Certifications in AWS, GCP, or Kubernetes (e.g., CKA, CKAD).