Talent.com
As candidaturas não são mais aceitas
ML Data Pipeline Engineer

ML Data Pipeline Engineer

Prosiglieremontenegro, Brazil
Há 3 dias
Descrição da vaga

We're seeking a Data Pipeline Engineer to own and evolve our exercise recognition training data infrastructure. You'll manage the end-to-end pipeline that collects, synchronizes, validates, and prepares IMU sensor and video data for ML model training.

This role combines systems engineering, data quality automation, and hands-on problem-solving in a production environment.

What You’ll Do

Pipeline Operations & Improvement

  • Maintain and enhance our multi-source data collection system : IMU sensors (via mobile app) and synchronized video streams from gym-based cameras.
  • Improve video capture software robustness, particularly handling network interruptions and operational monitoring.
  • Deploy and monitor services in remote Linux environments with appropriate DevOps practices.

Data Quality & Validation

  • Evolve our Python-based QC engine that validates data pre- and post-annotation
  • Implement checks for IMU-video time synchronization, sensor health, and measurement consistency
  • Apply digital signal processing techniques to identify sensor failures, connectivity issues, and measurement irregularities.
  • Develop validation logic comparing annotations against sensor data to ensure temporal alignment.
  • Analysis & Troubleshooting

  • Perform ad-hoc analysis on ~1,200+ workout tasks to classify failure modes
  • Identify whether issues stem from pipeline bugs, sensor problems, or annotation errors
  • Prioritize engineering work based on data quality impact and coordinate with annotation team on fixes
  • Tooling and Visualization

  • Maintain and extend our NextJS UI serving annotators, data scientists, and stakeholders
  • Create visualizations (Chart.js) for QC metrics and signal analysis
  • Integrate with LabelStudio annotation interface
  • What You Bring

    Required

  • Strong Python programming skills, particularly for data processing pipelines
  • Experience with time-series data and digital signal processing
  • Comfortable working in Linux environments and deploying / monitoring remote services
  • Ability to debug complex multi-component systems (sensors, video, networks, sync)
  • Data quality mindset : designing validation rules, tracking metrics, investigating anomalies
  • SQL / database experience for managing pipeline metadata
  • Highly Valued

  • Video processing experience (RTSP streams, encoding, OCR)
  • Working with sensor / IoT data and handling connectivity challenges
  • NextJS or modern web frameworks for data tooling
  • DevOps practices : containerization, monitoring, logging, alerting
  • Experience with annotation pipelines and ML training data workflows
  • Background in biomechanics, sports science, or wearable sensors
  • Tech Stack

  • Languages : Python (primary), JavaScript / TypeScript (NextJS UI)
  • Data : IMU sensor streams, video (RTSP), time-series analysis, DSP
  • Tools : LabelStudio, Chart.js, Linux / bash, OCR libraries
  • Infrastructure : Remote deployment, monitoring systems
  • You'll Thrive Here If You

  • Enjoy detective work : diagnosing why data doesn't match expectations
  • Balance pragmatism with quality : shipping improvements while maintaining reliability
  • Communicate well across technical and non-technical stakeholders
  • Can work autonomously in a small, mission-driven team
  • Criar um alerta de emprego para esta pesquisa

    Data Engineer • montenegro, Brazil