The Role
We are looking for a highly skilled Data Engineer with strong expertise in Python programming, data processing, and analytical problem-solving. This role requires a blend of analytical skills, engineering capabilities, and hands-on data manipulation to derive actionable insights, build efficient pipelines, and support data-driven decision-making across teams.
Responsibilities:
Data Exploration & Analysis:
- Analyze large and complex datasets to extract meaningful insights and drive decision-making processes.
- Identify data trends, anomalies, and opportunities for improvement within datasets and communicate findings clearly to stakeholders.
- Collaborate with cross-functional teams to understand business requirements and transform them into technical solutions.
Data Pipeline Development:
- Design, develop, and maintain robust data pipelines for efficient data ingestion, transformation, and storage.
- Optimize and automate data workflows to improve data availability, quality, and processing efficiency.
- Implement ETL (Extract, Transform, Load) processes to support analytics and reporting needs.
Data Modeling & Feature Engineering:
- Build, validate, and maintain data models to support machine learning and statistical analysis needs.
- Engineer and preprocess features for machine learning algorithms and ensure data quality and consistency.
- Develop scalable solutions for feature storage, retrieval, and real-time model serving.
Programming & Scripting:
- Write efficient, scalable, and well-documented Python code to support data engineering and analysis tasks.
- Collaborate on code reviews, optimize code performance, and apply best practices in coding and version control.
- Use Python libraries (e.g., Pandas, NumPy, SQLAlchemy) to streamline data workflows and support analysis.
Performance Optimization & Troubleshooting:
- Monitor, troubleshoot, and enhance the performance of data systems and pipelines.
- Address data integrity and pipeline issues promptly to ensure reliable data availability and system uptime.
- Implement monitoring and logging to preemptively detect and resolve issues.
Collaboration & Communication:
- Work closely with data scientists, analysts, and other engineers to develop cohesive data solutions.
- Translate complex technical issues into non-technical language for clear communication with stakeholders.
- Contribute to documentation, data standards, and best practices to foster a data-centric culture.
Job Requirements:
- Technical Skills: Strong proficiency in Python and familiarity with data processing libraries (e.g., Pandas, NumPy, PySpark). Experience with SQL for data extraction and manipulation.
- Data Engineering Knowledge: Experience in designing, building, and managing data pipelines, ETL workflows, and data warehousing solutions.
- Statistical & Analytical Skills: Ability to apply statistical methods for data analysis and familiarity with machine learning concepts.
- Problem-Solving Mindset: Proven ability to troubleshoot complex data issues and continuously improve workflows for efficiency and accuracy.
- Communication: Effective communication skills to convey data insights to technical and non-technical stakeholders alike.
- Bonus: Experience with cloud platforms (e.g., AWS, GCP), containerization (e.g., Docker), and orchestration tools (e.g., Airflow) is a plus.\
Preferred Education & Experience:
- Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, Mathematics, or a related field.
- 3+ years of experience in a data science or data engineering role.
Benefits
- Compensation commensurate with experience
- Unlimited vacation
- Ongoing education and training
- Bonuses and profit-sharing
Similar Jobs
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
The Jr. Data Engineer at Capco will develop and design data solutions using Pyspark or Scala, work with scheduling tools like Airflow, and build data pipelines using Hadoop components. The role requires knowledge of big data modeling and experience with version control, deployment tools, and debugging. Good to have skills include Java development and familiarity with cloud patterns and Agile methodologies.
Top Skills:
AirflowAnsibleApache HadoopSparkElastic SearchEtl FrameworksGitGitHiveJavaJenkinsJIRAMap ReducePysparkPythonRestful ServicesScalaSQLUnix/LinuxYarn
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
The Data Engineer at Capco will be responsible for upgrading to the latest golden source image and installing golden source on Google infrastructure, contributing to the transformation of the financial services industry.
Top Skills:
Golden SourceGoogle Infrastructure
Software • Consulting
As a Data Engineer, you will architect, create, and maintain data pipelines and ETL processes within AWS. You'll support the optimization of existing tools, collaborate with data science and analytics teams, and ensure compliance and governance in data usage while promoting a DevOps culture.
What you need to know about the Mumbai Tech Scene
From haggling for the best price at Chor Bazaar to the bustle of Crawford Market, the energy of Mumbai's traditional markets is a key part of the city's charm. And while these markets will always have their place, the city also boasts a thriving e-commerce scene, ranking among the largest in the region. Driven by online sales in everything from snacks to licensed sports merchandise to children's apparel, the local industry is worth billions, with companies actively recruiting to meet the demands of continued growth.