Site Reliability Engineer
Remotive
Remote
•8 hours ago
•No application
About
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.
Role Description
Be part of a global team that ensures the performance, scalability, and reliability of critical cloud-based applications. As part of the Global Investor and Distribution Solutions (GIDS) Platform Services team, you’ll play a key role in keeping our systems running smoothly and efficiently—while helping shape the future of our platform.
- Collaborate with global teams as part of a follow-the-sun support model.
- Respond to, troubleshoot, and resolve Level 2 application incidents.
- Ensure critical applications are effectively monitored using tools like Prometheus and Grafana.
- Create and maintain dashboards and alerts to enhance visibility into application health.
- Define, implement, and track key SRE metrics (SLOs, SLIs, error budgets).
- Partner with development teams to improve application reliability and resilience.
- Analyze incident trends and recommend improvements to reduce recurrence.
- Automate repetitive support tasks to improve efficiency.
- Participate in post-incident reviews and drive reliability initiatives.
- Perform infrastructure and application patching as part of regular maintenance cycles.
- Support security vulnerability remediation efforts across both infrastructure and application layers.
Qualifications
- Bachelor’s degree in Computer Science, Computer Engineering, IT, or related field.
- 5+ years of experience for senior roles; fresh graduates welcome for junior roles.
- Proficiency in one or more programming languages, preferably Java, JavaScript or Python.
- Proven ability to troubleshoot complex systems.
- Skilled in debugging, code optimization, and automation.
- Experience with relational databases and data analysis.
- Experience working in Site Reliable Engineer (SRE) roles or incident response environments.
- Hands-on experience with cloud infrastructure, preferably AWS.
- Familiarity with observability tools such as Grafana, ELK Stack, or similar.
- Experience deploying and managing applications on Kubernetes platforms.
- Strong skills in analyzing and troubleshooting issues in large-scale, distributed systems.
- Familiarity with PostgreSQL and its performance tuning, monitoring, and troubleshooting.
Benefits
- Flexibility: Hybrid Work Model & a Business Casual Dress Code, including jeans.
- Your Future: RRSP Matching Program, Professional Development Reimbursement.
- Work/Life Balance: Flexible Personal/Vacation Time Off, Sick Leave, Paid Holidays.
- Your Wellbeing: Medical, Dental, Vision, Employee Assistance Program, Parental Leave.
- Diversity & Inclusion: Committed to Welcoming, Celebrating and Thriving on Diversity.
- Training: Hands-On, Team-Customized, including SS&C Learning Institute.
- Extra Perks: Discounts on fitness clubs, travel and more!
- Wide-Ranging Perspectives: Committed to Celebrating the Variety of Backgrounds, Talents and Experiences of Our Employees.
