At Dev.Pro, we partner with businesses worldwide, from startups to Fortune 500 companies — across fintech, retail, hospitality and beyond.
With a remote‑first mindset and a team in 55+ countries, we focus on aligning technical expertise with client needs, communicating clearly, and staying adaptable as priorities shift. This commitment to ownership and flexibility helps us create lasting partnerships — so you can focus on what you do best.
With a remote‑first mindset and a team in 55+ countries, we focus on aligning technical expertise with client needs, communicating clearly, and staying adaptable as priorities shift. This commitment to ownership and flexibility helps us create lasting partnerships — so you can focus on what you do best.
About this opportunity
We invite a skilled Kubernetes Developer to join our fully remote, international team. In this role, you’ll build and optimize the Kubernetes orchestration platform and develop custom operators to run HPC / AI workloads efficiently on GPU clusters. You’ll enhance infrastructure performance and reliability, create internal tools to improve the developer experience, and ensure multi-tenant HPC workloads remain secure and compliant.
What's in it for you :
- Work on cutting-edge GPU infrastructure and next-gen HPC / AI workloads
- Build a Slurm-on-Kubernetes product from scratch and shape its architecture
- Collaborate with a top-tier international team and grow through continuous learning and conference participation
Is that you?
3+ years of hands-on Kubernetes experience in productionExperience with HPC schedulers (Slurm, PBS, LSF, Volcano)Strong background in GPU resource management and distributed systemsExperience with cloud / hybrid cloud architectures (AWS, GCP, Azure, on-prem GPU clusters)Knowledge of Kubernetes operators, CRDs, scheduling, networking, and storageDeep knowledge of HPC job scheduling and workload orchestrationExpertise in IaC (Terraform, Helm, or GitOps : ArgoCD / Flux) and monitoring & observability (Prometheus, Grafana, Jaeger, ELK)Programming skills in Go, Python, Bash / ShellFamiliarity with PyTorch, TensorFlow, distributed training, and model servingSkills in Linux administration, performance tuning, and advanced networking (RDMA, InfiniBand, TCP / IP, DNS, load balancing)Experience in storage management and optimization for large datasetsKey responsibilities and your contribution
In this role, you'll design, develop, and manage Kubernetes platforms for GPU-intensive AI / HPC workloads.
Design and build a Slurm-like orchestration layer on Kubernetes for HPC / AI workloadsDevelop custom operators and controllers for GPU job scheduling and executionIntegrate batch schedulers with Kubernetes to provide a hybrid HPC / Cloud productImplement advanced GPU resource managementBuild internal tools and a self-service platform to simplify AI / HPC job deployment and managementBuild a cloud-native platform for AI training, inference, and HPC workloadsOptimize scheduling to improve GPU utilization and reduce queue timesMonitor GPU clusters, troubleshoot production issues, and ensure high availability, fault tolerance, and disaster recoveryDevelop CI / CD pipelines for GPU-intensive workloadsImplement best practices for multi-tenant GPU clusters with AI / HPC workloadsEnsure compliance with data sovereignty and international regulationsMaintain secure container, runtime, and workload isolation policies