OverviewWe are a leading organization that specializes in the development of advanced AI products for healthcare and clinical research.The RoleDesign Infrastructure : Utilize infrastructure-as-code tools to design and automate infrastructure, ensuring high levels of efficiency and scalability.Build CI / CD Pipelines : Develop and maintain continuous integration and delivery pipelines, streamlining release automation processes.Operate Production Systems : Effectively operate and scale production systems on major cloud platforms, guaranteeing optimal performance and reliability.Implement Monitoring Practices : Establish robust monitoring and alerting practices, enabling swift incident response and minimizing downtime.Enforce Security Controls : Implement and enforce stringent security controls to safeguard protected health data, adhering to industry standards and regulations.Develop Disaster Recovery Plans : Create and test comprehensive disaster recovery and continuity plans, ensuring business continuity and minimizing disruptions.Produce Operational Documentation : Develop clear and concise operational documentation and runbooks, facilitating knowledge sharing and onboarding.Coach Junior Engineers : Provide guidance and mentorship to junior engineers and on-call teams, fostering growth and development.Collaborate with Engineering Teams : Work closely with engineering and research teams to accelerate fast, safe delivery of product features.RequirementsEssential Qualifications5+ Years in SRE / Infrastructure / Platform RoleIaC Experience : Proficient hands-on experience with infrastructure-as-code (Terraform or equivalent).
Container Orchestration Expertise : Proven experience with container orchestration (Kubernetes) in production environments.Programming Skills : Strong scripting / programming skills (e.g., Python).
CI / CD System Experience : Demonstrated work with CI / CD systems and pipelines.Cloud Provider Expertise : Experience running workloads on cloud providers (GCP, Azure, or AWS).
Observability Tools Familiarity : Familiarity with observability tools (metrics, logs, tracing).
Security Best Practices Knowledge : Practical understanding of security best practices and data-protection tooling.Communication Skills : Excellent communication, troubleshooting, and incident response skills.Prioritised Nice-to-HavesHealthcare Compliance ExperienceData-Intensive Environment ExperienceMentorship ExperienceTerraform + Kubernetes Expertise
Senior Engineer • São Paulo, Brasil