A leading organisation is on the look out for a Site Reliability Engineer to provide expertise in maintaining operational coverage of services and functions offered through the organisations cloud compute and storage environments including research infrastructure. This role is a full time, permanent position
They will consider related roles such as Cloud Engineer, DevOps Engineer and similar roles also for this position.
Key Responsibilities : Work with managed service providers, vendors and other external entities to ensure that outcomes will deliver services based on principles of continuous service improvement Take a hands-on approach supporting application environments and research infrastructure, ensuring timely and effective response to users' needs Reduce cloud sprawl by focusing on adopting and implementing cloud native automation Support analysis of metric-based monthly reports on capacity, cost and performance Establish a new framework for incident management within the organisation. Play a role in the production release process, ensuring the definition of done has been met. Contribute to system architecture and design sessions to ensure that all system improvements adhere to SRE best practices.
Key Skills : Experience with public cloud technology Azure and related platform toolsets is a must Experience with New Relic / Graylog / Nagios. IAC Docker / Kubernetes experience Powershell experience Experience in 24 / 7 monitoring of distributed systems. Strong knowledge in Windows or Linux OS, cloud storage, cloud networking Highly developed communication skills Great progression for anyone with a strong System Administration background who wants to take their skills to the next level Good knowledge of CI / CD deployment strategies.
Cloud Engineer • Arapiraca, Brasil