Data Engineering advanced ⏱ 12–16 hours
Airflow + Docker + Databricks ETL
Build a full ETL pipeline: Airflow orchestrates API ingestion, Docker handles the environment, and Databricks Delta tables store the result.
AirflowDockerDatabricksDelta LakeETL
View project on GitHub
What you’ll build
A production-style ETL pipeline that fetches job listing data from the Adzuna API, runs transformations inside a Dockerized Airflow environment, and loads the results into Databricks Delta tables. The entire setup is reproducible via Docker Compose.
Skills you’ll practice
- Airflow + Docker Compose: fully containerized orchestration
- API ingestion with error handling and retry logic
- Databricks REST API integration from Airflow operators
- Delta Lake table management and schema evolution