Lead DevOps Engineer

Bengaluru • Full Time • 8+ Years Experience • UK Shift

We are looking for a skilled Lead DevOps Engineer to join our team and help ensure our platform remains secure, scalable, and highly reliable. In this role, you will work closely with engineering leadership to set the technical direction of our infrastructure and ensure it scales to support our growth. You will play a key role in building a DevOps culture across the company, empowering developers, and maintaining operational excellence. This is an exciting opportunity for someone who has a deep technical understanding, enjoys mentoring others, and thrives in a dynamic, fast-paced environment.

Key Responsibilities

·      Platform Reliability & Availability: Collaborate with the Engineering Director and Principal Engineer to define the technical direction for our infrastructure, ensuring that it scales cost-effectively to support our growth.

·      Infrastructure Management: Utilize tools like Terraform, GitHub Actions, and scripting languages to manage and optimize our infrastructure and CI/CD systems.

·      Cloud Infrastructure Expertise: Become an expert in our technology stack, including AWS (RDS, ECS, EC2, S3, Lambda), Cloudflare, Redis, DNS, Docker, and the rest of the infrastructure platform.

·      Observability & Incident Response: Use observability tools such as DataDog and AWS CloudWatch to monitor platform health, troubleshoot performance issues, and identify underlying causes. Participate in the on-call rotation to respond to incidents.

·      Disaster Recovery & Incident Management: Ensure disaster recovery and incident response plans are regularly exercised and improved, using industry practices like gamedays and chaos engineering.

·      Developer Experience: Own and improve the developer experience by refining the development, testing, and continuous deployment processes to make it safer, faster, and easier for engineers to work.

·      CI/CD Leadership: Be an expert in CI/CD principles, empowering engineers to deliver high-quality services to production continuously.

·      Mentorship & Collaboration: Support software engineers by pairing, mentoring, and demonstrating effective engineering practices. Facilitate understanding of the production deployment process and performance debugging.

·      DevOps Practice Leadership: Define and manage platform engineering decisions, ensuring all engineers on the on-call rota are well-prepared for incident response.

Requirements

·      Experience architecting and supporting cloud-native web application infrastructure, ideally using AWS services like RDS, ECS, EC2, S3, and Lambda.

·      Hands-on experience with containers and schedulers (e.g., Amazon ECS) and expertise with automated configuration management systems such as Terraform.

·      Strong understanding of Linux, networking, and security.

·      Experience supporting database administration and performance, with a focus on scalability and maintainability.

·      Passion for automating processes and improving the developer experience.

·      Experience working in a DevOps environment, closely collaborating with software engineers.

·      Proficiency in version control (e.g., Git) and the ability to use it effectively to structure and communicate your work.

Good to Have

·      Programming experience in Ruby, JavaScript, or Go.

·      Experience managing relationships with third-party suppliers, such as AWS and Cloudflare.

·      Familiarity with gamedays, chaos engineering, and other industry practices to enhance platform resilience.

·      Experience with disaster recovery and business continuity planning.

platform reliability

infrastructure management

Terraform

GitHub Actions

scripting languages

AWS

RDS

ECS

EC2

S3

Lambda

Cloudflare

Redis

Docker

observability

incident response

DataDog

AWS CloudWatch

disaster recovery

gamedays

chaos engineering

CI/CD

support

operational leadership

cloud-native web application infrastructure

containers

schedulers

Amazon ECS

automation

configuration management

Linux

networking

security

ownership

initiative

collaboration

DNS

DevOps Culture

Git

Ruby

JavaScript

Go

Performance Optimization

Disaster Recovery

Business Continuity Planning