A leading organisation is on the look out for a Site Reliability Engineer to provide expertise in maintaining operational coverage of services and functions offered through the organisations cloud compute and storage environments including research infrastructure.
This role is a full time, permanent positionThey will consider related roles such as Cloud Engineer, DevOps Engineer and similar roles also for this position.Key Responsibilities : Work with managed service providers, vendors and other external entities to ensure that outcomes will deliver services based on principles of continuous service improvementTake a hands-on approach supporting application environments and research infrastructure, ensuring timely and effective response to users' needsReduce cloud sprawl by focusing on adopting and implementing cloud native automationSupport analysis of metric-based monthly reports on capacity, cost and performanceEstablish a new framework for incident management within the organisation.Play a role in the production release process, ensuring the definition of done has been met.Contribute to system architecture and design sessions to ensure that all system improvements adhere to SRE best practices.Key Skills : Experience with public cloud technology Azure and related platform toolsets is a mustExperience with New Relic / Graylog / Nagios.IACDocker / Kubernetes experiencePowershell experienceExperience in 24 / 7 monitoring of distributed systems.Strong knowledge in Windows or Linux OS, cloud storage, cloud networkingHighly developed communication skillsGreat progression for anyone with a strong System Administration background who wants to take their skills to the next levelGood knowledge of CI / CD deployment strategies.
Cloud Engineer • Varginha, Minas Gerais, Brasil