Data Engineering intermediate ⏱ 8–10 hours
Python Wheel Package in Databricks
Build modular PySpark ETL code, package it as a Python wheel, deploy to Databricks, and run it as an automated workflow.
DatabricksPySparkPythonWheelMLflow
View project on GitHub
What you’ll build
A Python package containing reusable PySpark ETL modules, built as a .whl file, uploaded to a Databricks cluster, and wired into a Databricks Workflow. This is how production Databricks pipelines are structured at scale.
Skills you’ll practice
- Structuring PySpark code as a Python package with
setup.py - Building and distributing Python wheel files
- Installing custom wheels on Databricks clusters
- Creating Databricks Jobs and Workflows via UI and API