We are hiring a hands-on Databricks Data Engineer to rebuild five Data Quality Scorecards using SAP ECC data already available in the Lakehouse. You will design and validate profiling logic, build rule-based data quality checks in PySpark, generate field-level and row-level results, and publish business-facing scorecards in Power BI. This role will also define reusable templates, naming conventions, and repeatable processes to support future scorecard expansion (47 more scorecards) and help transition the organization away from Informatica IDQ.
Responsibilities
Rebuild Data Quality scorecards in Databricks
Develop profiling logic (nulls, distincts, pattern checks)
Build PySpark-based Data Quality rules and row / column-level metrics
Create curated DQ datasets for Power BI scorecards
Establish reusable DQ rule templates and standardized development patterns
Work with SAP ECC data models
Support and mentor a junior developer (Mexico-based) on rule logic and development standards
Qualifications
Strong Databricks engineering experience (PySpark, SQL, Delta Lake)
Hands-on experience building Data Quality rules, frameworks, or scorecards
Experience in profiling large datasets and implementing metadata-driven DQ logic
Ability to mentor, review code, and explain concepts clearly
Excellent communication skills in English
Familiarity with SAP ECC tables and key fields (preferred)
Experience with Unity Catalog or Purview (nice to have)
Exposure to Lakehouse Monitoring or DQX accelerators (bonus)
If you are passionate about Data Quality, strong in Databricks / PySpark, and enjoy building reusable DQ capabilities, please apply today for immediate consideration!
Data Engineer • Rio das Ostras, Rio de Janeiro, Brazil