Senior Production Engineer (REMOTE)
Upbound
1 hour ago
•No application
About
- Upbound is the company behind Crossplane, the open source project which started the control plane revolution in the cloud native community. Upbound is redefining how modern infrastructure is built. As the creators of Crossplane and the pioneers of the Intelligent Control Plane, we are leading the shift toward agentic infrastructure: platforms that reason, adapt, and operate alongside AI-native systems.
- Upbound is hiring a Senior Production Engineer to build out the reliability and availability capabilities for Upbound Cloud, our managed control plane platform. This role combines software engineering, systems design, and operations leadership to ensure our platform is reliable, performant, and scalable. You’ll work closely with our software engineering and product teams to design, build, and operate services that power the Upbound Cloud experience for our customers. You’ll participate in incident response and reliability initiatives, mentor engineers across disciplines, and establish production best practices that align with our mission to deliver resilient, self-healing infrastructure systems.
- What You'll Do
- Contribute to the production engineering strategy for Upbound Cloud, ensuring high availability, scalability, and efficiency of all customer-facing systems. This includes internalizing the product strategy and developing levels of system resiliency to support product growth.
- Own reliability metrics — including uptime, latency, and error budgets — and champion service-level objectives (SLOs) across teams.
- Design and implement automation for provisioning, observability, and incident response to minimize human intervention and increase operational maturity.
- Collaborate with development teams to build reliability into the software lifecycle through proactive architectural reviews, chaos testing, and performance profiling.
- Operate and improve multi-tenant Kubernetes-based systems, leveraging Crossplane, and other cloud-native tooling.
- Drive incident management — leading blameless postmortems, root cause analyses, and systemic remediation efforts.
- Mentor engineers in production engineering practices, fostering a culture of ownership, reliability, and continuous improvement.
- Contribute to the evolution of our cloud platform through design input, tool selection, and scalable systems thinking.
- What You'll Bring
- 5+ years of experience in software, infrastructure, or site reliability engineering roles.
- Strong background in distributed systems, service-oriented architectures, and cloud-native technologies.
- Proficiency in Kubernetes, Go, and Infrastructure-as-Code strategies.
- Expertise in observability and monitoring preferably Honeycomb and OpenTelemetry.
- Experience managing large-scale SaaS systems in production with multi-region and high-availability requirements.
- Strong understanding of incident response, capacity planning, and change management.
- Excellent communication skills and ability to collaborate across functions.
A plus if you
- Experience with Crossplane, multi-cloud infrastructure, or control-plane architectures.
- Prior leadership experience driving reliability initiatives at scale.
- #LI-REMOTE
- #LI-REMOTE
- Why Upbound?
- At Upbound, you’ll help shape the systems and strategies that drive predictable, scalable growth in a product-led company embracing usage-based models. If you're excited to build from the ground up, work with cutting-edge cloud technologies, and directly impact how revenue is generated and scaled—this is your seat at the table.
- About Upbound
- Upbound is pioneering infrastructure platforms for the Agentic AI Era, serving Fortune 500 companies and platform engineers across more than 100 countries. The company empowers infrastructure and platform teams with Intelligent Control Planes - based on Kubernetes and Crossplane - that provision, operate, and adapt so platforms are ready for both humans and AI agents. Upbound is the creator and primary maintainer of Crossplane, the popular open-source framework for building cloud-native control planes, with over 100 million downloads and adoption by more than 1,000 teams worldwide. A Series B startup backed by GV (formerly Google Ventures), Altimeter Capital, and Intel Capital, Upbound has raised $69M to date. For more information, visit www.upbound.io.




