Weekday, Inc. Logo

Weekday, Inc.

Lead - Data & Ml Platform Engineering

Posted 2 Days Ago
Be an Early Applicant
In-Office
Mumbai, Maharashtra, IND
Expert/Leader
In-Office
Mumbai, Maharashtra, IND
Expert/Leader
Lead architecture, build, and operate a Databricks-based Lakehouse and ML platform across four pillars: Data Platform, ML Platform & MLOps, Platform Operations & FinOps, and Data Governance & Quality. Deliver sub-second inference, industrialize ML lifecycles with MLflow and Mosaic AI, implement governance-as-code, run FinOps for DBU cost allocation, and ensure platform reliability for retail-scale traffic and thousands of developers.
The summary above was generated by AI

This role is for one of the Weekday's clients

Min Experience: 10+ years

Location: Bengaluru, Mumbai

JobType: full-time

Focus Areas: (i) Data Platform Engineering, (ii) ML Platform & MLOps, (iii) Platform Operations & FinOps, (iv) Data Governance & Quality

Experience: 14–20 years total |  8–12 years in Data/ML Platform Engineering   

Core Platform: Databricks Intelligence Platform (Unity Catalog, Delta Lake, MLflow, Mosaic AI)

The Context

We are currently developing the “v2.0” intelligence layer atop this Lakehouse—aiming to standardize MLOps, expand Agentic AI capabilities, and guarantee that the platform delivers sub-second latency across the entire retail network, which includes tens of thousands of stores and high-traffic digital channels.

The Data & ML Platforms group (Group A in Enterprise IT) serves as the driving force behind this transformation. It is led by a VP (L2) and organized into four AVP-led pillars, supported by 10 AI-ready Platform Engineers and a transitioning team of Data Engineers. Each AVP is responsible for a specific platform layer and functions as a builder-leader—expected not only to manage but also to architect, perform code reviews, and actively contribute to development alongside their team.

The Four Pillars

We are seeking to hire four AVPs, each heading one of the platform pillars. While each AVP has full ownership of their respective pillar, all four collaborate closely as a unified leadership team under the VP. Candidates may be evaluated for placement in any pillar depending on their strengths and fit.


Requirements(i) Data Platform Engineering

Mission: Take full ownership of the core Lakehouse infrastructure, encompassing storage, compute, and developer platform layers that support all other operations.

  • Design and maintain the Delta Lake storage layer, Photon compute engine, and Unity Catalog abstraction, serving over 1,000 developers across various retail sectors.
  • Implement advanced optimization techniques including query plan tuning, cluster auto-scaling policies, Z-ordering strategies, and partitioning schemes for datasets with trillions of rows.
  • Manage the internal developer platform by developing SDKs, CLI tools, templates, and enabling self-service onboarding to accelerate new teams' time-to-first-query.
  • Lead the technical cleanup of Phase-1 migration challenges, including schema standardization, pipeline consolidation, and deduplication of source of record (SOR) systems across hundreds of sources.
  • Oversee the Data Engineer transition cohort within this pillar, establishing engineering standards, enforcing code review processes, and defining career progression paths.
(ii) ML Platform & MLOps

Mission: Industrialize machine learning by building infrastructure that efficiently moves models from experimentation notebooks to production at retail scale.

  • Develop and maintain the end-to-end ML lifecycle leveraging MLflow, including experiment tracking, model registry, automated retraining, A/B testing, and canary deployments.
  • Design the real-time inference architecture to deliver model serving with sub-100ms latency across recommendation, pricing, and demand forecasting applications.
  • Construct the Agentic AI infrastructure comprising RAG pipelines, vector stores, fine-tuning workflows for Foundation Models (utilizing Mosaic AI), and agent orchestration frameworks.
  • Establish governance for the Feature Store by standardizing feature definitions, enforcing freshness SLAs, lineage tracking, and promoting feature reuse across retail divisions.
  • Ensure reliability of the ML platform through GPU/TPU cluster management, training job scheduling, cost attribution per model, and managing incident response for production model degradations.
(iii) Platform Operations & FinOps

Mission: Maintain platform stability, performance, and cost-efficiency—especially during critical periods.

  • Ensure 99.99% platform uptime, providing leadership during critical events such as festive sales, store openings, and retail peak periods.
  • Establish and run the FinOps practice focusing on DBU cost allocation by team and workload, implementing chargeback models, automating resource right-sizing, and delivering executive cost dashboards.
  • Design and manage monitoring and observability systems covering pipeline health, query performance, cluster utilization, and data freshness SLAs across all six value streams.
  • Lead capacity planning by forecasting compute and storage demands in line with retail seasonality (festive cycles, new store launches, category introductions) and provisioning resources accordingly in advance.
  • Oversee incident management, develop runbooks, and conduct post-mortem evaluations for the Databricks platform, ensuring targets for mean time to recovery are met and continually improved.
(iv) Data Governance & Quality

Mission: Serve as the technical steward for India’s largest consumer dataset, ensuring its trustworthiness, compliance, and discoverability.

  • Develop “Governance-as-Code” frameworks on Unity Catalog, incorporating automated access controls, data classification, PII masking, and audit trails to comply with DPDP Act requirements.
  • Design and implement a data quality framework that includes automated profiling, anomaly detection, schema enforcement, and freshness monitoring across thousands of datasets.
  • Manage the data catalog and discovery platform, providing metadata management, lineage visualization, business glossary, and search tools to support over 1,000 users.
  • Build consent management infrastructure to monitor, enforce, and audit user consent signals throughout the comprehensive “Phygital” retail ecosystem (online and offline).
  • Drive enterprise-wide data standards by defining naming conventions, rules for SOR deduplication, master data alignment, and data contract enforcement between producing and consuming teams.
Minimum Qualifications (All Pillars)
  • 14 to 20 years of professional experience in software engineering, data engineering, or ML infrastructure, including a minimum of 3 years leading a platform team of 5 or more engineers.
  • 8 to 12 years of hands-on experience in building and scaling data or ML platforms such as Lakehouse architectures, Feature Stores, Streaming Engines, or MLOps pipelines.
  • Strong technical expertise within the Databricks ecosystem or similar distributed data platforms (e.g., Spark, Presto/Trino, Flink, or Kafka at scale), with a strong preference for Databricks experience.
  • Proven “builder-leader” approach: actively involved in code review, production debugging, and architectural decision-making without fully delegating technical responsibilities.
  • Experience operating within large and complex technology organizations featuring inherited teams, cross-functional dependencies, and enterprise-grade compliance requirements.
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related discipline, or equivalent expertise acquired through industry experience and open-source contributions.
Preferred Qualifications
  • Previous experience managing India-scale data platforms handling multi-billion events per day, petabyte-scale data warehouses, or real-time serving at over 10,000 queries per second.
  • Hands-on experience with MLflow, Mosaic AI, or similar ML infrastructure platforms at production level—not limited to experimentation phases.
  • Familiarity with retail or e-commerce data domains such as product catalogs, inventory management, order processing, customer behavior signals, or supply chain datasets.
  • Demonstrated success in building internal tooling or developer platforms that have gained widespread organic adoption within large engineering organizations.
  • Experience with FinOps practices including DBU/compute cost attribution, chargeback modeling, and enterprise-scale cloud cost optimization.
  • Knowledge of Indian data privacy regulations (DPDP Act) or global frameworks (GDPR, CCPA) in the context of data platform governance.
Organisation Context

This position reports directly to the VP & Head of Data & ML Platforms, who in turn reports to the Head of Enterprise IT, and ultimately to the CEO. You will collaborate as a peer with three other AVPs within the Data & ML Platforms group and work closely with more than 10 AI-ready Platform Engineers at Architect and Principal levels, alongside the transitioning Data & Platforms Engineers cohort.

The broader Enterprise IT division comprises five additional L2 groups: CISO/Cybersecurity, HR/Finance/Legal Platforms, SAP-Core, Systems & AI Architects, and CIO + Cloud & Infrastructure.

Must-have skills

Data & ML Platform, Databricks, Platform Architecture

Good-to-have skills

MLOps, System Architecture, Retail

Similar Jobs

An Hour Ago
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
Entry level
Entry level
Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Perform data entry, extraction, labeling, categorization and verification; prepare and correct source data; generate reports and backups; apply data analysis and visualization (Excel, SQL, Python, Tableau/Power BI); follow data quality, integrity and security policies; work in large teams, handle sensitive data, and operate on India night/24x7 shift model.
Top Skills: ExcelInternetMS OfficePower BIPythonSQLTableauWeb
An Hour Ago
Hybrid
Senior level
Senior level
Artificial Intelligence • Fintech • Information Technology • Logistics • Payments • Business Intelligence • Generative AI
The Sr. Software Engineer will scale Coupa platforms, collaborate with teams to build features, and mentor others while utilizing GenAI tools.
Top Skills: CSSJavaScriptMicroservicesMySQLReactRest ApisRuby On Rails
3 Hours Ago
Hybrid
Mid level
Mid level
Artificial Intelligence • Natural Language Processing • Professional Services • Analytics • Consulting • Conversational AI • Generative AI
Design, build, and maintain ETL pipelines to extract data from databases, applications and ERP/CRM systems; transform and validate data; load into data warehouses or data lakes; ensure data quality; optimize ETL performance; and collaborate with data architects, analysts and BI teams to deliver analytics-ready data.
Top Skills: Business IntelligenceCRMData LakeData PipelinesData WarehouseErpETLTalend

What you need to know about the Mumbai Tech Scene

From haggling for the best price at Chor Bazaar to the bustle of Crawford Market, the energy of Mumbai's traditional markets is a key part of the city's charm. And while these markets will always have their place, the city also boasts a thriving e-commerce scene, ranking among the largest in the region. Driven by online sales in everything from snacks to licensed sports merchandise to children's apparel, the local industry is worth billions, with companies actively recruiting to meet the demands of continued growth.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account