Surfalytics
All pet projects
Data Engineering intermediate ⏱ 8–10 hours

Python Wheel Package in Databricks

Build modular PySpark ETL code, package it as a Python wheel, deploy to Databricks, and run it as an automated workflow.

DatabricksPySparkPythonWheelMLflow
View project on GitHub

What you’ll build

A Python package containing reusable PySpark ETL modules, built as a .whl file, uploaded to a Databricks cluster, and wired into a Databricks Workflow. This is how production Databricks pipelines are structured at scale.

Skills you’ll practice

  • Structuring PySpark code as a Python package with setup.py
  • Building and distributing Python wheel files
  • Installing custom wheels on Databricks clusters
  • Creating Databricks Jobs and Workflows via UI and API