Senior Site Reliability Engineer
Remotive
Remote
•4 days ago
•No application
About
Jahnel Group’s mission is to provide the absolute best environment for software creators to pursue their passion by connecting them with great clients doing meaningful work. This is a full time position with one of our closest clients.
The Senior Site Reliability Engineer (SRE) ensures the reliability, scalability, and performance of the client's cloud-based software solutions. This role blends software engineering and systems administration to support and enhance critical infrastructure, working closely with development and operations teams to deliver secure and cost-effective cloud environments.
- Cloud Infrastructure Architecture and Implementation: Designs, builds, and maintains robust cloud infrastructure solutions using AWS and other cloud technologies.
- Mentorship and Team Development: Provides technical guidance and mentorship to junior SREs, promoting a culture of continuous learning and improvement.
- Operational Efficiency and Automation: Identifies and implements process improvements through automation and optimization to enhance reliability and reduce manual effort.
- Performance and Reliability Management: Develops and executes strategies to meet and exceed Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
- Incident Management: Leads incident response efforts, perform root cause analysis, and implement preventive measures to minimize downtime.
- Capacity Planning and System Optimization: Proactively identifies performance bottlenecks, optimize resource utilization, and ensure system scalability.
- Security and Compliance: Implements cloud security best practices, including least-privilege IAM policies, secrets management, and evidence generation for compliance frameworks (e.g., SOC 2, ISO 27001).
- Other duties and projects as assigned
- 5+ years in Site Reliability Engineering or a similar role.
- Extensive expertise in AWS (Amazon Web Services) cloud platform and services.
- Experience with GitOps practices and CI/CD tooling (e.g., GitHub Actions, Jenkins, ArgoCD, or similar).
- Experience with Infrastructure as Code (e.g., Terraform).
- Experience designing and maintaining observability stacks (e.g., Prometheus, Grafana, ELK) with a focus on actionable metrics, alerting, and SLOs.
- Strong problem-solving, troubleshooting, and analytical skills.
- Excellent communication and collaboration abilities.
- Organizational skills with attention to detail.
- Ability to manage time and prioritize tasks.
- Proficiency in scripting languages (e.g., Python, PowerShell).
- In-depth knowledge of Linux systems, networking, load balancing, and security principles.
- Texas or New York (Flexibility to work remotely)
- $100,000.00 to $150,000.00
- Salary is established based on various factors, including, but not limited to, prior employment history, job-related knowledge, education and training, skills, and geographic location.
