Main Purpose:▪Collaborate with data scientists and business stakeholders to design, develop, and maintain efficient data pipelines feeding into the organization's data lake.
▪
Maintain the integrity and quality of the data lake, enabling accurate and actionable insights for data scientists and informed decision-making for business stakeholders.
▪Utilize extensive knowledge of data engineering and cloud technologies to enhance the organization’s data infrastructure, promoting a culture of data-driven decision-making.
▪
Apply data engineering expertise to define and optimize data pipelines using advanced concepts to improve the efficiency and accessibility of data storage.
▪Own the development of an extensive data catalog, ensuring robust data governance and facilitating effective data access and utilization across the organization.Knowledge Skills and Abilities, Key Responsibilities:
Key Responsibilities
•Contribute to the development of scalable and performant data pipelines on Databricks, leveraging Delta Lake, Delta Live Tables (DLT), and other core Databricks components.
•Develop data lakes/warehouses designed for optimized storage, querying, and real-time updates using Delta Lake.
•Implement effective data ingestion strategies from various sources (streaming, batch, API-based), ensuring seamless integration with Databricks.
•Ensure the integrity, security, quality, and governance of data across our Databricks-centric platforms.
•Collaborate with stakeholders (data scientists, analysts, product teams) to translate business requirements into Databricks-native data solutions.
•Build and maintain ETL/ELT processes, heavily utilizing Databricks, Spark (Scala or Python), SQL, and Delta Lake for transformations.
Page
•Experience with CI/CD and DevOps practices specifically tailored for the Databricks environment.
•Monitor and optimize the cost-efficiency of data operations on Databricks, ensuring optimal resource utilization.
•Utilize a range of Databricks tools, including the Databricks CLI and REST API, alongside Apache Spark™, to develop, manage, and optimize data engineering solutions.
Key Relationships and Department Overview:
Key Relationships
•Internal – Data Engineering Manager
•Developers across various departments, Managers of Departments in other regional hubs of Puma Energy
•External – Platform providers