u/GeneBackground4270

Do data quality frameworks have to be so complex?

Looking for feedback from fellow data engineers.

I've been building an open-source data quality framework for PySpark called SparkDQ:
https://sparkdq-community.github.io/sparkdq/

The main goal is simplicity. It's Spark-native, lightweight, and lets you define data quality checks using Python configuration classes instead of external services or custom DSLs.

I'm curious:

What's your first impression?
Would you use something like this?
What features would you expect from a framework like this?

Any honest feedback is appreciated. Thanks!

reddit.com

u/GeneBackground4270 — 6 days ago

▲ 7 r/apachespark