Rubix Solutions is seeking a Site Reliability Engineer (SRE) to join our Sydney-based team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our critical digital services and systems. You will collaborate closely with software engineering teams, employing a blend of coding skills and operational expertise to design and maintain scalable and resilient infrastructure.
Responsibilities:
Chapter Responsibilities:
- Learn and improve your skills through dedicated chapter time.
- Share experiences within your chapter and across the Engineering practice.
- Generate excitement for technologies that aim to change the future.
- Evangelize implementations across the organization where beneficial.
- Design, implement, and maintain highly available and scalable microservices architecture on Azure and GCP cloud platforms.
- Collaborate with development teams to ensure applications are designed with reliability, scalability, and observability in mind.
- Develop and maintain automation tools and processes to streamline deployment, monitoring, and incident response.
- Participate in on-call rotations to provide 24/7 support for production systems, responding to and resolving incidents promptly.
- Implement and maintain observability tools such as monitoring, logging, and tracing to ensure visibility into system performance and health.
- Conduct post-incident reviews and implement improvements to prevent recurrence of issues.
- Stay updated on industry best practices and emerging technologies related to cloud architecture, microservices, and observability.
- Assist teams in identifying and removing manual repetitive tasks from their work.
- Embed observability into all aspects of the application ecosystem.
- Evaluate platform and service consumption to optimize costs and capacity.
- Demonstrable experience working in technical delivery teams, preferably up to 10+ people.
- Knowledge of cloud infrastructure and integration across enterprise platforms.
- Experience in both agile and project-based execution.
- Ability to switch context between multiple technologies.
- Strong communication, collaboration, and influencing skills.
- Experience implementing SRE practices and influencing engineers and product owners to prioritize reliability.
- Strong experience with SRE practices, DevSecOps, and observability platforms.
- Some experience with cloud-native application development, relational & document databases, and service integration patterns.
- Good experience with a subset of technologies in our current stack, including Cloud (Microsoft Azure, Google Cloud Platform), Code (Powershell, Bash, C#, .NET Core, NodeJS, GoLang), Databases (Microsoft SQL Server, MongoDB, CosmosDB, PostgreSQL), CI (Azure DevOps, GitHub, Jenkins), Infrastructure (Kubernetes, Terraform), and Observability (App Insights, Dynatrace, Azure Log Analytics, Google Cloud Monitoring).
- Long term engagement
- Large enterprise end customer
- Collaborative work environment