About the role
We are looking for a Senior Site Reliability Engineer (SRE).
This person will help building, supporting, and managing the activities towards the best practices implementation, on the following main areas:
- Cloud DevOps (preferably Google Cloud)
- System Reliability;
- CI/CD pipelines (development and maintenance);
- Observability;
- Operations Automation;
- Environment Security.
As a part of your job, you will:
- Participate in the solution definition to ensure its operability;
- Ensure the end-to-end solution resilience:
- Collaborate in the definition of performance tests
- Participate in the definition of resilience tests
- Look at monitoring KPI’s & logging efficiency to introduce new tools towards a more reliable solution
- Work with developers during the software development lifecycle to ensure that developed services are operationalized.
- Ensure the solution observability:
- Define monitoring requirements (e.g. log types)
- Implement performance metrics and business KPI's
- Implement CI/CD best practices and ensure its evolution and maintenance;
- Work with stakeholders to fully understand and communicate the Root Cause Analysis and implement the lessons learnt;
- Drive initiatives to make the solution (and all its components) more reliable – that is, less prone to cause support tickets.
What are we looking for?
- Experience working in DevOps culture and its tools
- Experience in application reliability practices for client (internal and client) facing experiences
- Cloud (preferably GCP);
- Containers (Docker, Kubernetes) is relevant;
- CI/CD pipelines (preferably with GitLab CI)
- Automation tooling (e.g. Terraform, Ansible);
- Observability tooling (e.g. ELK stack)
- Source Code Management (Git Lab)
- Speak English fluently;
- Experience managing internal and external stakeholders’ expectations;
- Experience working with development teams and operational support teams;
- Cultivate curiosity and thirst for learning;
- Believe in the collective: teams and groups.
Nice to have:
- JIRA, Confluence;
- Agile certifications;
- Cloud certifications;
- At least 3 years of experience working on large scale, multiple agile team projects.
Personal traits:
Ability to adapt to different contexts, teams and Clients
Teamwork skills but also sense of autonomy
Motivation for international projects and ok if travel is included
Willingness to collaborate with other players
Strong communication skills
We want people who like to roll up their sleeves and open their minds. Believe this is you? Come join the Team!
Job is Archived
You may have followed an invalid link or the job you are looking for has been archived.
Learn About GoHire