Senior Data Engineer

Bengaluru • Full Time • 6+ Years Experience • UK Shift

We are looking for an experienced Senior Data Engineer to join our team. In this role, you will work on building and maintaining the infrastructure and tooling that provides comprehensive insights into our business data and supports data-driven decision-making across the organization. You will collaborate closely with data scientists, analysts, and software engineers to deliver robust data solutions. This is an exciting opportunity for someone with a deep passion for data technologies and a collaborative approach to work in a fast-paced environment.

 Key Responsibilities

·      ETL Development & Data Warehousing: Design, build, deploy, and iteratively improve ETL processes and data warehousing solutions, ensuring data integrity, performance, and security.

·      Collaboration Across Teams: Work closely with data scientists, analysts, and product teams to enable data collection, analysis, and the development of data-driven features.

·      Data Pipeline Optimization: Develop and maintain data pipelines using tools like Airflow and Airbyte to ensure efficient data flow from multiple data sources, including MySQL, Sendgrid, Iterable, and application logs.

·      Machine Learning Integration: Work alongside data scientists to implement and improve machine learning models, deploying them as services to leverage our growing data volumes effectively.

·      Technical Leadership: Provide input during planning, story mapping, and other product development activities to help shape features and make informed technical decisions.

·      DevOps & Reliability: Collaborate with Site Reliability Engineers and the Technology team to align the data platform work with the overall technical direction, ensuring efficient integration between the application platform and the data platform.

 

Our Tech Stack

·      Core ETL process written in Ruby, transitioning towards an S3 and Iceberg data lake.

·      Data ingestion managed by Airflow and Airbyte.

·      Data storage using Snowflake, structured following Kimball dimensional modeling principles.

·      Data transformation handled with dbt.

·      Infrastructure managed on Kubernetes (EKS) with Terraform.

·      Additional support for a recommendation engine developed in Python.

 

Requirements

·      Proven experience in building and deploying data warehouses and ETL pipelines, with an emphasis on maintainability, performance, and security.

·      Strong programming skills in Python or other general-purpose programming languages, with the ability to write clean, maintainable, and well-tested code.

·      Experience working with data platforms and technologies, ideally with direct experience in some of our stack (e.g., Airflow, Snowflake, Kubernetes, Terraform).

·      Professional experience working in a cloud-based environment with a DevOps culture, including hands-on experience with data ingestion, processing, and storage.

·      Ability to understand and solve data-related challenges for analysts, data scientists, and other engineers.

·      Strong written and verbal communication skills, with the ability to engage with stakeholders at various levels of the organization.

 

Good to Have

·      Experience working with Ruby or willingness to adapt to new languages.

·      Knowledge of Kimball dimensional modeling and its application in data warehousing.

·      Exposure to data lake architecture and the use of Iceberg or similar technologies.

·      Familiarity with cloud platforms and infrastructure-as-code practices (e.g., Terraform).

ETL Process Development

Cross-Team Collaboration

Technical Consistency

Optimized Workflows

Data Warehousing

Data Engineering

CI/CD

Data Pipelines

Automated Data Quality Checks

ETL Pipelines

Performance

Security

Maintainability

Python

Data Platforms

Data Technologies

Data Ingestion

DevOps

Written and Verbal Communication Skills

Mentoring

Leadership

Airflow

Airbyte

MySQL

Iceberg Datalake

Kubernetes

Kimball dimensional modelling principles

Infrastructure-as-Code

Data Lake Architecture

Cloud-Based Infrastructure

Machine Learning Integration

Terraform

Kubernetes (EKS)

dbt

Snowflake

S3

Ruby

Iterable

Sendgrid