Title : Senior Manager, Engineering Operations
Level : Manager or Senior Manager
Location : Brazil
Work setup : Remote (within 1hr from São Paulo, Brazil)
- Eventually will be hybrid, 2-3 days in
office once we open an office next year
Department : EngineeringTalent Systems, LLC is the leading technology solution provider for casting and auditioning to
the entertainment industry.
Casting directors and agents worldwide use Talent Systems'
portfolio of products to source and manage talent across film, television, commercials, theater
and digital projects, powering an unparalleled, global casting software ecosystem.
We are headquartered in Los Angeles and operate in the US, Canada, UK, Australia and India.
Our portfolio brands include Casting Networks, Spotlight, Cast It Systems, Cast It Talent,
Casting Frontier, Staff Me Up, Cast It Reach & Tagmin.Job Description
We are seeking an experiencedHead of Engineering, Engineering Operationsto lead our
engineering operations which includes areas such DevOps, Site Reliability Engineering (SRE),
CI / CD, Release management etc for our cloud-based systems and applications.
This role is
pivotal in ensuring the reliability, security, scalability, and availability of our systems while driving
innovation in automation, CI / CD pipelines, and operational efficiency.
You will be responsible for
crisis management, improving system performance, cost and fostering a culture of operational
excellence.Responsibilities
? Leadership and strategy? Lead and mentor teams in DevOps, SRE, and Engineering Operations, fosteringa culture of collaboration, ownership, and innovation.
? Develop and execute the strategic roadmap for engineering operations, aligningwith business goals and product requirements.
? Advocate for and implement industry best practices in system reliability, DevOps,and automation.
? Reliability and availability? Drive initiatives to improve the reliability, availability, and performance of cloud-based applications and infrastructure.
? Establish performance measurements for various system health metrics.? Ensure robust incident management and crisis response processes to minimizedowntime and customer impact.
? DevOps and CI / CD? Oversee the design, implementation, and optimization of CI / CD pipelines toenable seamless and automated deployment processes.
? Leverage automation tools and practices to reduce manual interventions andimprove operational efficiency.
? Collaborate with product and engineering teams to enable rapid and reliablefeature delivery.
? Monitoring and observability? Implement and maintain advanced monitoring, logging, and alerting systems togain deep insights into system health and performance.
? Use observability tools (e.g., Grafana) to proactively identify and resolve issuesbefore they impact customers.
? Crisis and Incident Management? Lead crisis management efforts during high-severity incidents, ensuring quickresolution and effective communication with stakeholders.
? Conduct root cause analyses and drive post-mortem reviews to identify andaddress operational gaps.
? Team Development and Collaboration? Build, grow, and retain a high-performing engineering operations team withexpertise in DevOps and SRE practices across multiple geolocations.
? Foster close collaboration with development, data, and product teams to alignengineering operations with overall business objectives.
? Promote a blameless post-mortem culture to encourage continuous learning andimprovement.
? Cost Optimization and Security? Optimize cloud infrastructure costs while maintaining system reliability andscalability.
? Implement robust security practices in operations to ensure compliance withindustry standards and regulations.Qualifications
? 10+ years of experience in software engineering, with 5+ years in leadership roles? Proven track record of improving system reliability, availability, and performance forcloud-based applications.
? Extensive experience with CI / CD pipelines and automation tools.? Demonstrated expertise in crisis management and incident response in high-pressureenvironments.
? Deep knowledge of cloud platforms (such as AWS) and container orchestration tools(Kubernetes, Docker).
? Strong proficiency in monitoring and observability tools like Grafana.? Excellent problem-solving and decision-making skills under pressure.? Exceptional communication and collaboration skills, with the ability to influencestakeholders across engineering and business teams.
? Proven ability to lead and grow high-performing teams in a fast-paced environment.? A strong focus on fostering a culture of accountability, learning, and operationalexcellence.
? Influence partner engineering teams like platform and product engineering.