Corrective ActionsCustomer-Focused ServiceMonitoring PerformanceReliabilityReliability EngineeringRoot Cause AnalysisSite Reliability EngineeringWritten
LondonUK
495+ employees
SaaS
Open for applications
Role
Who you are
Proven experience as an AWS Cloud Engineer with hands-on expertise in EKS, Terraform, and Helm
Strong background in Docker and Docker Swarm
In-depth knowledge of AWS IAM roles, policies, and CloudWatch logs
Proficient in Linux environments and scripting languages such as Bash and Python
Excellent understanding of web technologies, REST APIs, and DevSecOps principles
Experience with monitoring solutions like Grafana and Prometheus
Exceptional oral and written communication skills
Strong customer-facing communication skills, capable of effectively explaining issues and RCAs
Experience in product/application support for SaaS-based products
Understanding of APIs, databases, systems architecture, and design
AWS Certified Solutions Architect
Working knowledge of IaC, CI/CD and observability
Desirables
Ability to work independently and collaboratively within a team
Strong problem-solving skills and the ability to troubleshoot issues in production environments
Customer-focused mindset, always considering the impact on customers when planning deployments and updates
Ability to lead and motivate a team, fostering a culture of continuous improvement and excellence
What the job involves
The SRE Manager is responsible for leading the Site Reliability Engineering (SRE) team, owning and optimizing the incident management process, and ensuring the reliability and performance of the company's SaaS products
This role requires strong leadership, excellent communication skills, and the ability to work collaboratively across various departments to achieve organizational goals
The ideal candidate will have a deep understanding of cloud infrastructure, incident response, and customer support