Your role and responsabilities :
Handling major incidents via CIRS (Critical Issue Response System) and providing frequent updates until resolution.
Performing deep-dive application troubleshooting and identifying preventive actions.
Managing CIRS-related requests including deployments, feature toggles, and data fixes.
Following up on major production incidents and coordinating with cross-functional teams.
Enhancing monitoring capabilities using tools like Dynatrace, Kibana, and Splunk .
Writing and improving monitoring scripts and alerts based on incident learnings.
Handling customer escalations and coordinating with Support & Engineering teams.
Supporting planned activities and responding to ad-hoc requests from CES teams.
Requirements and Qualifications :
Deep experience in DevOps and Production Support .
Experience in automation and CI / CD practices.
Familiarity with cloud platforms (GCP, AWS, or Azure preferred).
Hands-on experience with monitoring tools such as Dynatrace, Kibana, Splunk .
Strong troubleshooting skills and ability to deep dive into application issues.
Excellent communication and coordination skills across teams.
Please submit resumé in English.
Site Reliability Engineer • Foz do Iguaçu, Paraná, Brazil