Design, build, and optimize high-throughput ETL/ELT pipelines (Glue/PySpark), orchestrate workflows (Airflow/Step Functions), manage and tune multi-terabyte Redshift clusters, lead migrations from Snowflake/RDBMS, integrate heterogeneous sources, implement monitoring and governance, and collaborate with stakeholders to deliver analytics-ready datasets.
Job Title: Data Engineer
Location: Mumbai
Experience: 3-5 Years
Employment Type: Full-time
Position Overview:We are looking for a highly skilled and hands-on Senior Data Engineer to join our growing data engineering practice in Mumbai. This role requires deep technical expertise in building and managing enterprise-grade data pipelines, with a primary focus on Amazon Redshift, AWS Glue, and data orchestration using Airflow or Step Functions. You will be responsible for building scalable, high-performance data workflows that ingest and process multi-terabyte-scale data across complex, concurrent environments.
The ideal candidate is someone who thrives in solving performance bottlenecks, has led or participated in data warehouse migrations (e.g., Snowflake to Redshift), and is confident interfacing with business stakeholders to translate requirements into robust data solutions.
Key Responsibilities:● Design, develop, and maintain high-throughput ETL/ELT pipelines using AWS Glue (PySpark), orchestrated via Apache Airflow or AWS Step Functions.
● Own and optimize large-scale Amazon Redshift clusters and managing high concurrency workloads for very large user base:
● Lead and contribute to migration projects from Snowflake or traditional RDBMS to Redshift, ensuring minimal downtime and robust validation.
● Integrate and normalize data from heterogeneous sources including REST APIs, AWS Aurora (MySQL/Postgres), streaming inputs, and flat files.
● Implement intelligent caching strategies, leverage EC2 and serverless compute (Lambda, Glue) for custom transformations and processing at scale.
● Write advanced SQL for analytics, data reconciliation, and validation, demonstrating strong SQL development and tuning experience.
● Implement comprehensive monitoring, alerting, and logging for all data pipelines to ensure reliability, availability, and cost optimization.
● Collaborate directly with product managers, analysts, and client-facing teams to gather requirements and deliver insights-ready datasets.
● Champion data governance, security, and lineage, ensuring data is auditable and well-documented across all environments.
● 3-5 years of core data engineering experience, especially focused in Amazon Redshift hands-on performance tuning and large-scale management capacity.
● Demonstrated experience handling multi-terabyte Redshift clusters, concurrent query loads, and managing complex workload segmentation and queue priorities.
● Strong experience with AWS Glue (PySpark) for large-scale ETL jobs.
● Solid understanding and implementation experience of workflow orchestration using Apache Airflow or AWS Step Functions.
● Strong proficiency in Python, advanced SQL, and data modeling concepts.
● Familiarity with CI/CD pipelines, Git, DevOps processes, and infrastructure-as-code concepts.
● Experience with Amazon Athena, Lake Formation, or S3-based data lakes.
● Hands-on participation in Snowflake, BigQuery, or Teradata migration projects.
● AWS Certifications such as:
○ AWS Certified Data Analytics – Specialty
○ AWS Certified Solutions Architect – Associate/Professional
● Exposure to real-time streaming architectures or Lambda architectures.
● Excellent communication skills — must be able to confidently engage with both technical and non-technical stakeholders, including clients.
● Strong problem-solving mindset and a keen attention to performance, scalability, and reliability.
● Demonstrated ability to work independently, lead tasks, and take ownership of large-scale systems.
● Comfortable working in a fast-paced, dynamic, and client-facing environment.
Similar Jobs
Artificial Intelligence • Automotive • Computer Vision • Information Technology • Internet of Things • Logistics • Software
Senior GIS Data Engineer responsible for developing and maintaining geospatial datasets and data pipelines. Build, run, and optimize ETL workflows (batch to flow), transform and integrate spatial data using FME/ArcGIS/Python/SQL, ensure data quality, provide GIS consultancy, and collaborate cross-functionally to deliver scalable mapping solutions and improve processes.
Top Skills:
ArcgisData ScienceFmeGenerative AiLlmMachine LearningPythonSQL
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Build, test, and maintain GCP-based data pipelines and ETL/ELT processes using Python and SQL. Optimize BigQuery queries, design data models, ensure data quality, troubleshoot pipeline issues, document solutions, and collaborate with analysts and engineers to deliver scalable data warehouse solutions.
Top Skills:
BigQueryGoogle Cloud PlatformPub/SubPythonSQL
Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
Design and implement data solutions by translating business problems into technical designs; lead modules, mentor junior team members, validate architectures, develop ETL/warehouse/reporting solutions, and support project delivery including planning, testing, deployment and client collaboration.
Top Skills:
Big DataCloud PlatformsData AnalyticsData ManagementData WarehousingPythonRdbmsReportingSQL
What you need to know about the Mumbai Tech Scene
From haggling for the best price at Chor Bazaar to the bustle of Crawford Market, the energy of Mumbai's traditional markets is a key part of the city's charm. And while these markets will always have their place, the city also boasts a thriving e-commerce scene, ranking among the largest in the region. Driven by online sales in everything from snacks to licensed sports merchandise to children's apparel, the local industry is worth billions, with companies actively recruiting to meet the demands of continued growth.



