Job title: Cloud Engineer – Consultant (3-6 years)
About
At Deloitte, we do not offer you just a job, but a career in the highly sought-after risk management field. We are one of the business leaders in the risk market. We work with a vision to make the world more prosperous, trustworthy, and safe. Deloitte’s clients, primarily based outside of India, are large, complex organizations that constantly evolve and innovate to build better products and services. In the process, they encounter various risks and the work we do to help them address these risks is increasingly important to their success—and to the strength of the economy and public security.
By joining us, you will get to work with diverse teams of professionals who design, manage, implement & support risk-centric solutions across a variety of domains. In the process, you will gain exposure to the risk-centric challenges faced in today’s world by organizations across a range of industry sectors and become subject matter experts in those areas.
Our Audit and Assurance services professionals help organizations effectively navigate business risks and opportunities—from financial risks to operational, IT, business and regulatory risks—to gain competitive advantage. We apply our experience in ongoing business operations and corporate lifecycle events to help clients become stronger and more resilient. Our market-leading teams help clients embrace complexity to accelerate performance, disrupt through innovation, and lead in their industries. We use cutting-edge technology like AI/ML techniques, analytics, and Robotic Process Automation (RPA) to solve Deloitte’s clients‘ most complex issues. Working in Audit and Assurance at Deloitte US-India offices has the power to redefine your ambitions.
Work you’ll do
The key job responsibilities will be to:
· Reliability & Performance
- Design, implement, and maintain scalable, resilient cloud or hybrid infrastructure (e.g., AWS, Azure, GCP).
- Define, measure, and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Collaborate closely with application teams to enhance the reliability, performance, and scalability of products and services.
- Conduct capacity planning to ensure systems scale efficiently with growth.
· Automation & Toil Reduction
o Identify repetitive operational tasks (toil) and automate them through scripts, custom tooling, or frameworks.
o Champion infrastructure-as-code and configuration management using technologies such as Terraform, Ansible, or CloudFormation.
o Manage and improve CI/CD pipelines to streamline deployments and reduce manual processes.
· Monitoring, Incident Management & Continual Improvement
o Implement comprehensive system monitoring, alerting, and observability solutions (e.g., Prometheus, Grafana, Datadog, ELK Stack).
o Drive incident response processes, facilitate root cause analysis, and lead blameless postmortems to promote learning and continuous improvement.
o Develop and maintain operational runbooks for incident and disaster recovery scenarios.
· Security & Compliance
o Ensure infrastructure and services adhere to security best practices and compliance requirements.
o Collaborate with security teams to implement robust access controls, data protection measures, and monitoring for potential vulnerabilities.
Required skills
• Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent experience.
• 3+ years of experience in Site Reliability Engineering, DevOps, Systems Engineering, or Software Engineering.
• Proficient in at least one programming or scripting language (e.g., Python, Go, Bash).
• Hands-on experience managing production systems in cloud environments (AWS, GCP, Azure).
• Knowledge of infrastructure-as-code practices and tools (Terraform, CloudFormation, etc.).
• Strong background in monitoring, alerting, and observability (e.g., Prometheus, Grafana, Datadog, ELK Stack).
• Familiarity with containerization and orchestration (e.g., Docker, Kubernetes).
• Excellent analytical, troubleshooting, documentation, and communication skills.
• Expertise in Natural Language Processing (NLP), Deep Learning, LLMs & Generative AI (Gen AI)
• Experience with frameworks such as TensorFlow, PyTorch and Keras
• Familiar with AI/Gen AI Ethics & Governance frameworks, applications and archetypes
Preferred skills
- Experience with SRE practices such as SLOs, SLIs, error budgets, and blameless postmortems.
- Industry certifications related to cloud, DevOps, or SRE (AWS, GCP, Azure, CNCF).
- Strong understanding of network protocols, security principles, and system architecture.
- Experience with chaos engineering or resilience testing.
- Previous involvement in capacity planning, scalability analysis, or traffic/load management at scale.
Qualification
• B.Tech/B.E. and/or MBA
Preferred Locations
• Hyderabad
• Bengaluru
• Gurgaon
#CA-SP