Staff Data Engineer
Job Location: Dallas, TX
Summary
The Staff Data Engineer, MLOps leads the design, build, and optimization of Hershey’s machine learning operations platform—enabling data science and AI teams to develop, deploy, monitor, and govern ML models at enterprise scale. Sitting within Platform Engineering, this role owns the infrastructure, tooling, and automation that move models from experimentation to production with speed and confidence.
This is a foundational role—you will define and build Hershey’s MLOps capability from the ground up, shaping the platform, establishing engineering standards, and growing the team as the function matures. Whether your background is in ML engineering, data platform engineering, or DevOps with ML exposure, we’re looking for someone who can bridge the gap between data science and production infrastructure on Azure Cloud and Databricks.
What We Are Building for Hershey
Hershey is building an AI-driven enterprise platform that transforms how we compete across retail, supply chain, and commercial. We are standing up a unified MLOps foundation on Azure and Databricks that will power demand forecasting models that sharpen inventory and production planning, real-time pricing and promotion optimization engines for our retail and commercial partners, computer vision and quality-detection models on manufacturing lines, and next-generation consumer analytics that personalize how we reach millions of households. This role is at the center of that transformation—engineering the platform that turns breakthrough data science into production AI at Hershey scale.
Major Duties & Responsibilities
1. ML Platform Engineering & Infrastructure
-
Design and maintain the end-to-end MLOps platform on Azure and Databricks: model training infrastructure, feature stores, experiment tracking, model registries, and serving endpoints.
-
Build and optimize CI/CD pipelines for automated model training, validation, packaging, and deployment across environments.
2. Model Deployment, Monitoring & Lifecycle Management
-
Implement model serving patterns (batch, real-time, edge) with blue-green and canary deployment strategies for safe rollouts.
-
Build monitoring frameworks for data drift, concept drift, and prediction quality; automate alerting and retraining triggers.
3. Governance, Reproducibility & Responsible AI
-
Enforce ML governance: model versioning, experiment lineage, artifact management, approval workflows, and audit trails.
-
Embed responsible AI practices including explainability tooling, bias detection, and documentation standards.
4. Infrastructure as Code & Cost Optimization
-
Author IaC (Terraform/Bicep) for Azure ML workspaces, Databricks clusters, networking, and compute; optimize costs through autoscaling, spot instances, and GPU scheduling.
5. Collaboration & Enablement
-
Partner with Data Scientists to productionize models; develop self-service templates and documentation for platform onboarding; mentor junior engineers.
Required Knowledge, Skills, and Abilities
-
MLOps & ML Engineering: Experience taking ML models from experimentation to production, including training automation, model packaging, deployment, and monitoring. Our environment uses MLflow, Databricks Model Serving, and Azure Machine Learning.
-
Cloud & Platforms: Strong hands-on experience with Azure Cloud and Databricks. Familiarity with services such as Azure ML, AKS, Azure DevOps, Data Factory, Unity Catalog, Workflows, and Model Registry.
-
Programming & Development: Strong Python and SQL; experience with ML frameworks (PyTorch, Scikit-learn, XGBoost); comfort building APIs and writing modular, testable code.
-
Collaboration & Communication: Proven ability to partner across Data Science, Architecture, and business teams; experience mentoring engineers and driving technical standards.
Preferred Skills
-
CI/CD & IaC: ML-specific CI/CD pipelines (Azure DevOps, GitHub Actions); Terraform or Bicep for infrastructure provisioning.
-
Containerization & Orchestration: Experience with Docker and Kubernetes for model serving and workload management.
-
Monitoring & Observability: Drift detection, prediction quality tracking, and observability tooling (Evidently AI, Azure Monitor, Grafana).
-
Certifications: Azure Data Engineer (DP-203), Azure AI Engineer (AI-102), or Databricks ML Professional.
Experience & Education
-
Bachelor’s degree in Computer Science, Engineering, Data Science, or related field; Master’s preferred.
-
5–10 years in software, ML, data platform, or infrastructure engineering with 3+ years building or operating ML pipelines, model serving infrastructure, or ML platform tooling.
-
Hands-on experience with Azure and Databricks in a production ML context.
#LI-KR1
#LI-Onsite
The Hershey Company is an Equal Opportunity Employer. The policy of The Hershey Company is to extend opportunities to qualified applicants and employees on an equal basis regardless of an individual's race, color, gender, age, national origin, religion, citizenship status, marital status, sexual orientation, gender identity, transgender status, physical or mental disability, protected veteran status, genetic information, pregnancy, or any other categories protected by applicable federal, state or local laws.
The Hershey Company is an Equal Opportunity Employer - Minority/Female/Disabled/Protected Veterans.
You may request a reasonable accommodation if you are unable or limited in your ability to use or access our online application process as a result of a disability.
You can request an accommodation via phone or email.
To request an accommodation via phone, please call +1 877-804-1794 and leave a voicemail with your contact information. You may also email a request for accommodation to ApplicationHelp@hersheys.com. Please be sure to include “Accommodation Needed” in the subject line. This will ensure that your email is routed to the appropriate contact who will handle your request.
Nearest Major Market: Dallas
Nearest Secondary Market: Fort Worth