Surfalytics
All pet projects
Data Engineering intermediate ⏱ 5–7 hours

Databricks DQX: Data Quality at Scale

Use Databricks DQX to run data quality checks inside notebooks and pipelines — validate schemas, completeness, and custom rules.

DatabricksData QualityDQXPySpark
View project on GitHub

What you’ll build

A Databricks notebook demonstrating DQX checks on a real dataset: schema validation, null checks, value range assertions, and custom rules. Integrates with Delta Lake pipelines for continuous quality monitoring.

Skills you’ll practice

  • Databricks DQX API: rules, checks, and result reporting
  • Embedding data quality into Delta Live Tables or notebooks
  • Custom rule definitions for business logic validation
  • Monitoring quality trends over time