Position Summary

Job Title: Senior Linux AI Support Engineer

Job Overview:

We are seeking a highly skilled Senior Linux Support Engineer to provide after-hours support for a high-performance computing (HPC) AI environment hosted on bare-metal, on-premise infrastructure. The ideal candidate will be experienced in Linux system administration, troubleshooting HPC clusters, and optimizing performance in an AI-driven computational setting.

Key Responsibilities:

  • Monitor, maintain, and troubleshoot Linux-based HPC infrastructure outside of regular business hours.
  • Provide incident response and technical support for HPC cluster failures, performance degradation, and user-reported issues.
  • Manage bare-metal servers, ensuring reliability, security, and optimal resource utilization.
  • Deploy, configure, and upgrade Linux OS and HPC software stacks as needed.
  • Collaborate with AI engineers, researchers, and IT teams to optimize workloads and resource scheduling.
  • Maintain automated monitoring and alerting systems to proactively detect failures.
  • Perform log analysis, debugging, and root cause analysis for complex system issues.
  • Ensure compliance with security policies, access controls, and data integrity standards.
  • Document solutions, operational procedures, and troubleshooting guides for continued improvements.
  • Contribute to automation efforts using scripts, configuration management, and infrastructure as code (IaC).

Required Skills & Experience:

  • 6-8 years of experience in Linux system administration, preferably in an HPC or AI-driven environment leveraging RHEL and Debian based distros.
  • Deep understanding of bare-metal infrastructure concepts and management (networking, storage, provisioning).
  • Strong knowledge of containerization (Docker, Singularity) and orchestration tools.
  • High level of proficiency in scripting and programming (Bash, Python, GoLang) and automation tools (Ansible, Puppet).
  • Familiarity with NAS storage systems and protocols NFS, SMB, CIFS.
  • Troubleshooting expertise in performance tuning, kernel optimizations, and system-level debugging.
  • Strong problem-solving skills with the ability to work independently in high-pressure situations.
  • Excellent communication skills for coordinating with remote teams and end-users.

Preferred Skills & Experience:

  • Red Hat Certified Engineer (RHCE) or equivalent Linux certification.
  • Experience with Nvidia GPUs and tool stacks.
  • HPC-related certifications, coursework in AI computing or relative experience.
  • Hands-on experience with AI/ML workloads in HPC environments is a plus.
  • Experience with Kubernetes or HPC workload schedulers (e.g., Slurm, PBS, Grid Engine).

Recruiting tips

From developing a stand out resume to putting your best foot forward in the interview, we want you to feel prepared and confident as you explore opportunities at Deloitte. Check out recruiting tips from Deloitte recruiters.
Benefits

At Deloitte, we know that great people make a great organization. We value our people and offer employees a broad range of benefits. Learn more about what working at Deloitte can mean for you.
Our people and culture

Our inclusive culture empowers our people to be who they are, contribute their unique perspectives, and make a difference individually and collectively. It enables us to leverage different ideas and perspectives, and bring more creativity and innovation to help solve our clients' most complex challenges. This makes Deloitte one of the most rewarding places to work.
Our purpose

Deloitte’s purpose is to make an impact that matters for our people, clients, and communities. At Deloitte, purpose is synonymous with how we work every day. It defines who we are. Our purpose comes through in our work with clients that enables impact and value in their organizations, as well as through our own investments, commitments, and actions across areas that help drive positive outcomes for our communities. 
Professional development

From entry-level employees to senior leaders, we believe there’s always room to learn. We offer opportunities to build new skills, take on leadership opportunities and connect and grow through mentorship. From on-the-job learning experiences to formal development programs, our professionals have a variety of opportunities to continue to grow throughout their career.

Requisition code: 304110