Site Reliability Engineer (SRE)

J Bandy Consulting Limited

London
Permanent
Remote
£70,000 - £85,000/year
Kubernetes & Cloud Infrastructure (AWS/GCP)Golang for Automation & ToolingSRE Best Practices & Observability
This job posting has expired and is no longer accepting applications.

About The Role

We are seeking a talented and motivated Site Reliability Engineer (SRE) to join our client's dynamic, fast-paced team. In this role, you will be instrumental in designing, building, and maintaining the scalable, reliable, and high-performance infrastructure that powers our services. You will work at the intersection of software engineering and infrastructure operations, applying a software engineering mindset to system administration topics. Coming from a start-up background, you'll thrive in our agile environment and play a key part in shaping our platform's future.

Key Responsibilities

  • Design, build, and maintain our core infrastructure on AWS and GCP using Infrastructure as Code (IaC) principles with Terraform.
  • Develop and manage our Kubernetes clusters, focusing on automation, observability, and scalability to support our microservices architecture.
  • Write and maintain software in Golang to automate operational tasks, improve system reliability, and build tooling for the engineering team.
  • Participate in an on-call rotation to respond to production incidents, leading blameless post-mortems to drive continuous improvement.
  • Champion SRE best practices across the engineering organisation, including SLOs/SLIs, error budgets, and proactive monitoring and alerting.

Required Skills & Experience

  • Proven experience as a Site Reliability Engineer, DevOps Engineer, or a similar role.
  • Strong proficiency in Golang for automation and building internal tools.
  • Extensive hands-on experience with Kubernetes for container orchestration in a production environment.
  • Demonstrable experience managing cloud infrastructure (AWS and/or GCP) using Terraform.
  • Previous experience working within a start-up or a similarly fast-paced, agile environment.
  • Solid understanding of CI/CD pipelines, monitoring, and observability principles.

Nice-to-Have

  • Experience with other programming languages such as Python or Rust.
  • Familiarity with service mesh technologies like Istio or Linkerd.
  • Certifications in AWS, GCP, or Kubernetes (e.g., CKA, CKAD).