We're looking for a Senior ML / AI Engineer to own and evolve our LLM-powered user experience. You'll work directly with our technical co-founder to build, optimize, and monitor agent systems that parse workout descriptions, provide scaling recommendations, and enable conversational data retrieval - all with production-grade accuracy and speed.
This is a hands-on role focused on the ML / AI engineering side : prompt engineering, model optimization, agent orchestration, and continuous improvement based on real-world usage patterns.
What You’ll Do
Core Responsibilities
Own the workout parsing system : improve accuracy of our fine-tuned model (currently Qwen-based) that converts natural language workout descriptions into structured schemas
Design and implement agent workflows for workout scaling recommendations and performance tracking
Build observability workflows using Langfuse to identify and systematically address model performance issues
Optimize agent response latency while maintaining accuracy across our tool-based reasoning system
Collaborate on agent architecture decisions, including potential migration to frameworks like DSPy
Ship production features : workout entry system, scaling recommendations, and score reporting
What We’re Looking For
Required
5+ years of ML / AI engineering experience with at least 2 years working with LLMs in production
Strong prompt engineering and model optimization skills
Experience building and deploying agent systems with tools / functions
Proven ability to use observability platforms to diagnose and improve model performance
Experience with model fine-tuning (any framework / approach)
Strong Python programming skills
Active CrossFit participant - candidates should understand standard movements and workout structures
Strongly Preferred :
Experience with agent orchestration frameworks (DSPy, LlamaIndex, or similar)
Background in production ML operations and monitoring
Experience with Modal.com or similar serverless ML platforms
Track record of iteratively improving LLM systems based on user feedback and metrics
Experience fine tuning similar open-source LLMs
Success in First 6 Months
Ship workout entry system with improved parsing accuracy
Launch basic workout scaling recommendations
Implement user score reporting and retrieval
Establish robust monitoring workflows to catch and address model failures and poor user experiences
Contribute to agent architecture decisions as we scale
Ai Engineer • Pouso Alegre, Minas Gerais, Brazil