Artificial Intelligence & Engineering
AI & Engineering leverages cutting-edge engineering capabilities to help build, deploy, and operate integrated/verticalized sector solutions in software, data, AI, network, and hybrid cloud infrastructure. These insights are powered by engineering for business advantage, helping transform mission-critical operations.
Join our AI & Engineering team to help transform technology platforms, drive innovation, and help make a significant impact on our clients' achievements. You’ll work alongside talented professionals reimagining and re-engineering operations and processes that could be critical to businesses.
Level: Cloud Integrated Infrastructure Engineer III
As a Cloud Integrated Infrastructure Engineer III in Deloitte’s AI&E practice, you will help design and implement fully integrated architecture for GPU-accelerated AI factories and high-performance computing infrastructure, working closely with Deloitte AI specialists, senior architects, and our ecosystem partners. You will contribute to end-to-end solutions – from discovery and reference architecture mapping through sizing and implementation. You will collaborate with Sales Executives, AI application specialists, delivery engineering, and managed services to help clients achieve measurable outcomes from private AI assets. You will support technical solution development for pursuits and active opportunities and help translate complex client needs into clear, complete solutions and delivery requirements. Your role spans the full project lifecycle, including estimation, planning, execution, and tracking key metrics for analysis, supporting high-quality and timely delivery of solutions.
Work you'll do
As a Cloud Integrated Infrastructure Engineer III on the Hybrid Cloud Infrastructure team, you will be responsible for…
- Contribute to reference architectures for artificial intelligence and high-performance computing infrastructure across compute, network, storage, platform, and software layers in edge, data center, and hybrid environments
- Translate business requirements into scalable, secure, and cost-optimized solutions while supporting architecture, design, and integration decisions
- Configure and implement NVIDIA platforms, graphics processing unit clusters, orchestration layers, and hybrid infrastructure components for artificial intelligence and high-performance computing workloads
- Develop infrastructure as code and automation using Terraform, Ansible, and GitOps, and support observability, site reliability engineering, resilience, and security practices
- Troubleshoot graphics processing unit, hardware, connectivity, and software issues, and collaborate with cross-functional teams to support delivery quality and operational outcomes
The team
At Hybrid Cloud Infrastructure, we deliver solutions spanning Hybrid Cloud, Advanced Connectivity, AI Data Centers, High-Performance Computing, and AI Infrastructure to help clients achieve their desired outcomes. Our offerings include engineered transformation services for hybrid cloud infrastructure and platforms, prioritizing resiliency, optimization, and extensive automation. We integrate Advanced Connectivity, with AI Infrastructure and AI to boost operational efficiency and enable real-time data processing, crucial for critical low-latency enterprise operational technology (OT) applications. Additionally, we provide comprehensive management of all facets of operations for hybrid cloud infrastructure and field operations.
Location: Bengaluru/Hyderabad/Pune
Shift Timings: As per business requirements
Qualifications
Required:
- 6-9 years of experience in infrastructure engineering or implementation for large-scale platforms, including design, implementation, operations, and optimization
- Experience building or supporting graphics processing unit-accelerated platforms for artificial intelligence, machine learning, or high-performance computing workloads
- Experience with Linux system administration in production environments
- Experience deploying or operating distributed compute clusters for artificial intelligence or high-performance computing in hybrid cloud environments, including multi-graphics processing unit configurations, scheduler integration, and edge-to-cloud scaling
- Experience with high-performance networking or storage for artificial intelligence or high-performance computing
- Experience building containerized platforms using Kubernetes or Red Hat OpenShift, including graphics processing unit operators, drivers, CUDA container runtime, and cluster lifecycle automation
- Experience automating infrastructure as code using Terraform and Ansible
Preferred:
- Experience implementing artificial intelligence or high-performance computing cluster scheduling using Slurm and Kubernetes, including multi-tenant queues, quotas, and graphics processing unit-aware policies
- Experience supporting generative artificial intelligence infrastructure patterns, including multi-node distributed training
- Experience with artificial intelligence agents and frameworks
- Experience with high-throughput storage for artificial intelligence or high-performance computing
- Exposure to pre-sales or sales engineering activities, including discovery sessions, solution demonstrations, and proposal or request for proposal contributions
- Hands-on involvement in at least one end-to-end deployment of reference architecture in cloud or on-premises environments, including security controls, network segmentation, operational runbooks, and validation testing