Senior Site Reliability Engineer – Selby Jennings
Efinancialcareers

London
•8 hours ago
•No application
About
Our client, a leading systematic hedge fund, is seeking a Senior Site Reliability Engineer to join their London-based platform team. In this role, you will focus on enhancing the reliability, resilience, and day-to-day operability of a rapidly scaling engineering platform. You will work closely with software engineers and platform owners to strengthen observability, improve incident response processes, and drive measurable reliability outcomes. To be successful, you will bring hands-on experience applying SRE principles in production environments, alongside strong expertise in Linux systems. You must be capable of building and operating containerized workloads using tools such as Docker or Podman, and hold strong experience in Go and/or Python. We are looking for a highly technical individual with strong Infrastructure-as-Code proficiency, and the ability to effectively query, interpret, and reason about metrics using PromQL. A key part of this role will involve owning and improving the overall effectiveness of the platform's observability. Responsibilities: Own the effectiveness of the observability platform, ensuring high-quality signals, alert fidelity, and ongoing suitability as the platform scales.Build and maintain actionable, low-noise dashboards and alerting across metrics and logs.Define and apply SLIs and SLOs where they support operational decision-making.Apply IaaC across observability and supporting systems.Improve the reliability, scalability, and operability of core services through hands-on engineering changes. Requirements: Strong practical experience applying SRE principles in production environments.Strong Linux systems knowledge.Strong development experience in Go and/or Python.Strong IaaC proficiency.OpenTelemetry experience (metrics, logs, traces).Kubernetes and cloud-native platform experience.




