Job description
At PointsBet, we have only built a fraction of what we have imagined.
Launching in Australia in 2017, PointsBet are dedicated to providing our customers with the ultimate betting experience. We're on a mission to make racing and sports betting easier to understand, fun to use and faster. After some recent business changes, PointsBet Australia is about to head into an exciting stage of it's journey and as we continue to grow, we're looking for Site Reliability Engineer.
The Site Reliability Engineer is Responsible for the Reliability engineering & Tech support for the company’s backend systems & customer-facing apps. The company operates across multiple regions hence this role will require the person to be On-Call support with the ability of engineering system alerts, dashboard, triage/RCA of incidents and define autoscaling policies.
What you’ll own:
• Monitor & analyze production systems for reliability.
• Production systems are scaled up/out to have performance stability while being cost-efficient.
• Production incident response and if needed perform triage, escalation to required team and implementation of mitigation solutions.
• Collaboration with developers on the set-up of monitoring, alerting, and scaling capabilities as part of the product delivery life cycle.
• Increase system reliability and reduce manual interventions for tasks by R&D of automated processes.
• Knowledge shared by creating documents and conducting sessions with team members.
• Support key business peak periods like weekend by being On-Call based on need/rotations (once in 4-5 weeks). For any weekend support, extra off on weekday is given.
• Timely and accurate communication with a variety of stakeholders.
The skills we seek:
Must have - Mandatory skills:
• Good working experience with at least 1 cloud platforms like Azure, AWS & Google Cloud.
• Knowledge about containerization and container orchestration tools like Docker, Kubernetes, etc.
• IAC tools such as Terraform, Terragrunt, etc.
• Hands-on experience in any scripting language such as PowerShell, Shell Script, Python etc.
• Experience in monitoring and logging tools like Datadog, Azure app-insights, Prometheus, Grafana, Loki, ELK Stack, etc.
• Experience in ticketing and Incident Management Process & tools like JIRA, PagerDuty, Opsgenie, etc.
Good to have skills but not mandatory:
• Able to perform database SQL/T-SQL query writing with ability to diagnose performance issues. Database engines: MSSQL/Postgres.
• Understanding of Networking components.
• Basic understanding of Cloudflare.
• Good understanding of Release Management Process.
• Understanding of SLI/SLO/SLA
• Experience with Azure DevOps
Core Competencies:
• Strong analytical mindset to break and perform complex tasks
• Able to perform root cause analysis with a problem-solving attitude
• Any experience in Site reliability Engineering or DevOps Engineering is preferred
The PB Perks!
• PointsBet Flex Program – Filled with Hybrid Working, Work from Anywhere weeks, sabbatical Leave to name a few
• PointsBet Day – Get your PointsBet anniversary off
• Annual Bonus Scheme – We reward great work, earn even more on top of our competitive salary packages
• Parental Leave – 18 weeks for primary carers and 4 weeks for Secondary Carers
So you don’t tick every requirement on the job description but are still interested? Please apply anyway! We are built different at PointsBet, because we think different. We promote diversity of thought and would love to hear from a wide range of applicants as we believe that any unique skillset adds value to the team.