Job Description
Must Have
4+ years of software or machine-learning engineering experience, including hands-on experience deploying models to production. Demonstrated track record operationalizing models developed by data scientists into reliable, scalable, and observable production systems. Production experience with classical machine-learning workloads — distinct from generative-AI application development. Strong collaboration skills across data science, software engineering, and platform teams. Commitment to building reliable, well-engineered, and maintainable systems.
Nice to Have
Experience with feature stores and large-scale or streaming data pipelines. Knowledge of infrastructure-as-code (e.g., Terraform) and cloud cost optimization. Experience with workflow orchestration tools (e.g., Apache Airflow). Familiarity with model-serving frameworks and API design for inference. Light exposure to generative-AI or LLM deployment. Experience operating systems in a regulated or enterprise environment. Cloud or MLOps certifications.
Responsibilities
Build, deploy, and maintain production machine-learning pipelines, from data ingestion and feature processing through to model serving. Operationalize models developed by data scientists, ensuring reliability, scalability, reproducibility, and performance. Implement MLOps practices including CI/CD for machine learning, model versioning, automated retraining, and model registries. Design and maintain feature pipelines and, where relevant, feature stores. Set up monitoring and alerting for model performance, data drift, and system health, and respond to degradation. Optimize inference latency, throughput, and resource cost for deployed models. Containerize and orchestrate machine-learning workloads using Docker and Kubernetes on Oracle Cloud Infrastructure (OCI). Automate and harden data and model workflows to reduce manual intervention. Collaborate with data scientists on model handoff, packaging, and success metrics. Work with software engineers to integrate models into client-facing applications and services. Apply software-engineering best practices including testing, code review, and documentation to machine-learning systems. Troubleshoot and resolve production issues across the model-serving stack.
Qualifications
Bachelor's degree in Computer Science, Software Engineering, or a related field; equivalent experience accepted. Strong Python engineering skills and production experience with ML frameworks such as scikit-learn, Tensor Flow, or PyTorch. Hands-on experience with MLOps tooling (e.g., MLflow, Kubeflow) and CI/CD pipelines. Proficiency with containerization and orchestration (Docker, Kubernetes). Experience building and maintaining data pipelines and processing large datasets. Experience with cloud infrastructure, ideally Oracle Cloud Infrastructure (OCI) and OCI Data Science. Understanding of model monitoring, drift detection, and retraining strategies. Solid software-engineering fundamentals including testing and version control.