We're seeking a Data Pipeline Engineer to own and evolve our exercise recognition training data infrastructure. You'll manage the end-to-end pipeline that collects, synchronizes, validates, and prepares IMU sensor and video data for ML model training.
This role combines systems engineering, data quality automation, and hands-on problem-solving in a production environment.
What You’ll Do
Pipeline Operations & Improvement
- Maintain and enhance our multi-source data collection system : IMU sensors (via mobile app) and synchronized video streams from gym-based cameras.
- Improve video capture software robustness, particularly handling network interruptions and operational monitoring.
- Deploy and monitor services in remote Linux environments with appropriate DevOps practices.
Data Quality & Validation
Evolve our Python-based QC engine that validates data pre- and post-annotationImplement checks for IMU-video time synchronization, sensor health, and measurement consistencyApply digital signal processing techniques to identify sensor failures, connectivity issues, and measurement irregularities.Develop validation logic comparing annotations against sensor data to ensure temporal alignment.Analysis & Troubleshooting
Perform ad-hoc analysis on ~1,200+ workout tasks to classify failure modesIdentify whether issues stem from pipeline bugs, sensor problems, or annotation errorsPrioritize engineering work based on data quality impact and coordinate with annotation team on fixesTooling and Visualization
Maintain and extend our NextJS UI serving annotators, data scientists, and stakeholdersCreate visualizations (Chart.js) for QC metrics and signal analysisIntegrate with LabelStudio annotation interfaceWhat You Bring
Required
Strong Python programming skills, particularly for data processing pipelinesExperience with time-series data and digital signal processingComfortable working in Linux environments and deploying / monitoring remote servicesAbility to debug complex multi-component systems (sensors, video, networks, sync)Data quality mindset : designing validation rules, tracking metrics, investigating anomaliesSQL / database experience for managing pipeline metadataHighly Valued
Video processing experience (RTSP streams, encoding, OCR)Working with sensor / IoT data and handling connectivity challengesNextJS or modern web frameworks for data toolingDevOps practices : containerization, monitoring, logging, alertingExperience with annotation pipelines and ML training data workflowsBackground in biomechanics, sports science, or wearable sensorsTech Stack
Languages : Python (primary), JavaScript / TypeScript (NextJS UI)Data : IMU sensor streams, video (RTSP), time-series analysis, DSPTools : LabelStudio, Chart.js, Linux / bash, OCR librariesInfrastructure : Remote deployment, monitoring systemsYou'll Thrive Here If You
Enjoy detective work : diagnosing why data doesn't match expectationsBalance pragmatism with quality : shipping improvements while maintaining reliabilityCommunicate well across technical and non-technical stakeholdersCan work autonomously in a small, mission-driven team