Job Description
We are seeking an experienced AI/HPC Support Engineer to maintain, enhance, and support our AI operations. The ideal candidate will ensure reliable system performance, optimize workflows, troubleshoot technical issues, and collaborate with cross-functional teams to deliver robust AI solutions.
Responsibilities
- Monitor and maintain AI systems for performance and reliability.
- Streamline data pipelines, model training, and deployment processes.
- Troubleshoot technical issues across software, hardware, and infrastructure.
- Collaborate with teams to deploy and integrate AI solutions seamlessly.
- Document processes, monitor system performance, and recommend improvements.
- Partner with data scientists, AI engineers, and IT teams to support the deployment and scalability of AI models. Facilitate seamless integration of AI solutions into business operations.
- Stay informed about emerging AI technologies and practices. Identify opportunities to incorporate new tools, frameworks, or techniques to enhance system capabilities.
- Implement security measures and ensure regulatory compliance.
Qualifications
- Degree in Computer Science, Engineering, or related field.
- Expert proficiency in Python, Java, or C++, with substantial experience in ML frameworks (e.g., TensorFlow, PyTorch, other).
- Expert-level familiarity with cloud platforms (AWS, Azure, GCP), containerization tools (Docker, Kubernetes), and CI/CD pipelines.
- Expertise in supporting and implementing bare metal Kubernetes environments.
- Extensive experience supporting diverse AI/ML models and using advanced MLOps tools (e.g., MLflow, Kubeflow) including installation, support and end-to-end maintenance.
- Experience in installation, maintenance and support of AI Data Science tools and frameworks deployed in a distributed Kubernetes hosted clustered environment.
- Ability to partner with data scientists, AI engineers, and IT teams to support the deployment and scalability of AI models and tools. Facilitate seamless integration of AI solutions into business operations.
- Expertise with Nvidia tool stacks (AI Foundry, Generative AI, etc)
- Hands-on expertise with big data tools and database management systems.
- Proven ability to troubleshoot and resolve complex technical issues effectively.
- Strong analytical, problem-solving, and organizational skills.
- Excellent communication and teamwork abilities, enabling effective collaboration across diverse teams.
Recruiting tips
Benefits
At Deloitte, we know that great people make a great organization. We value our people and offer employees a broad range of benefits. Learn more about what working at Deloitte can mean for you.
Our people and culture
Our purpose
Professional development