Surfalytics
All pet projects
Data Engineering advanced ⏱ 12–16 hours

Airflow + Docker + Databricks ETL

Build a full ETL pipeline: Airflow orchestrates API ingestion, Docker handles the environment, and Databricks Delta tables store the result.

AirflowDockerDatabricksDelta LakeETL
View project on GitHub

What you’ll build

A production-style ETL pipeline that fetches job listing data from the Adzuna API, runs transformations inside a Dockerized Airflow environment, and loads the results into Databricks Delta tables. The entire setup is reproducible via Docker Compose.

Skills you’ll practice

  • Airflow + Docker Compose: fully containerized orchestration
  • API ingestion with error handling and retry logic
  • Databricks REST API integration from Airflow operators
  • Delta Lake table management and schema evolution