A leading organisation is on the look out for a Site Reliability Engineer to provide expertise in maintaining operational coverage of services and functions offered through the organisations cloud compute and storage environments including research infrastructure. This role is a full time, permanent position
They will consider related roles such as Cloud Engineer, DevOps Engineer and similar roles also for this position.
Key Responsibilities :
Work with managed service providers, vendors and other external entities to ensure that outcomes will deliver services based on principles of continuous service improvement
Take a hands-on approach supporting application environments and research infrastructure, ensuring timely and effective response to users' needs
Reduce cloud sprawl by focusing on adopting and implementing cloud native automation
Support analysis of metric-based monthly reports on capacity, cost and performance
Establish a new framework for incident management within the organisation.
Play a role in the production release process, ensuring the definition of done has been met.
Contribute to system architecture and design sessions to ensure that all system improvements adhere to SRE best practices.
Key Skills :
Experience with public cloud technology Azure and related platform toolsets is a must
Experience with New Relic / Graylog / Nagios.
IAC
Docker / Kubernetes experience
Powershell experience
Experience in 24 / 7 monitoring of distributed systems.
Strong knowledge in Windows or Linux OS, cloud storage, cloud networking
Highly developed communication skills
Great progression for anyone with a strong System Administration background who wants to take their skills to the next level
Good knowledge of CI / CD deployment strategies.
Cloud Engineer • Navegantes, Santa Catarina, Brazil