Job Title
Service Reliability Engineer
PURPOSE OF THE ROLE:
The goal of a Service Reliability Engineer will be to accelerate Application teams' ability to reliably and consistently deliver applications by developing standardized automation to form a common continuous deployment pipeline for functional engineering teams as a whole.
Other responsibilities include ongoing issues such as change management, problem management, incident management, performance improvement, and automation/tool development.
The Service Reliability Engineer is expected to excel under pressure, work well with others, be self-motivated, and be able to manage short and long term projects. Implementing automation for kick starting, monitoring, management, and support will be a key component of the position.
The Service Reliability Engineer will actively interface with software developers, network engineers, systems, storage, project management and database administrators on projects and provide support as required.
The Service Reliability Engineer will troubleshoot and resolve issues quickly and effectively. Good communication and teamwork is extremely important. The role also involves participating in the 24 x 7 pager rotation of the team.
Main Responsibilities:
Application Support
Proactive incident management in synchronization with frontline services and Incidents Response Team
Incident response: Monitor and build/define alerting to enable auto-recovery. Provide automation to ensure auto-recovery. In case of not-automatically recovered issue, ensure first full recovery. Once solved, analyze the root-cause of the issue, liaising with the development teams if needed, implement a specific monitoring and automate a response that will manage auto-recovery if the same issue happens again
Assist developers in debugging and diagnosing application errors in the form of incidents and problem records using various monitoring tools such as Splunk and Grafana
Support application deployments, building new systems and upgrading and patching existing ones.
Operate the platform within our security and privacy guidelines.
Service Automation
Participate in the design and building of tools and processes to support operations. Leverage scripting to build required automation and tools on an ad-hoc basis.
Build and develop automation to enable quick & safe instance deployment
Design, drive, develop and use monitoring tools to find problems, resolve and/or escalate to development and ensure that we exceed our Service Level Agreements - Developing python-based programs for automating monitoring and recovery services
Continuous Improvement
Be accountable for an applicative platform according to SLA, NFR and operability criteria
Contribute to the definition of SLAs, OLAs and NFR
Adopt and ensure usage of monitoring tools to find problems, raise alert, and ensure that we meet our SLAs/OLAs
Ensure process reengineering and optimization
Proactive thought leadership for creative and efficient technology solutions.
Drive continuous improvement to the service delivered to customer (agility, stability).
Drive the enforcement and definition of operational requirements / non-functional requirements in collaboration with application owners and middleware organizations
Document configuration processes and policies - Improving efficiency of the existing python-based service automations by identifying deficiencies and limitations -Making recommendations on new solutions and proposing improvements by analyzing different sources of information.
Relevant Experience:
Experience in an operational ITSM role, ideally in a mission-critical environment
Exposure to operations in open-source and cloud stacks
Experience in operational automation
ITIL and/or Cloud technology certifications are a plus
Lean management or similar is a plus
At least University Bachelor degree (or equivalent) in Computer Science or related technical field with 2 years of post-qualification experience
Diversity & Inclusion
We are an Equal Opportunity Employer and seek to hire the best candidate regardless of age, beliefs, disability, ethnicity, gender or sexual orientation.