u/GeneBackground4270

▲ 0 r/Python

Do data quality frameworks have to be so complex?

Looking for feedback from fellow data engineers.

I've been building an open-source data quality framework for PySpark called SparkDQ:
https://sparkdq-community.github.io/sparkdq/

The main goal is simplicity. It's Spark-native, lightweight, and lets you define data quality checks using Python configuration classes instead of external services or custom DSLs.

I'm curious:

  • What's your first impression?
  • Would you use something like this?
  • What features would you expect from a framework like this?

Any honest feedback is appreciated. Thanks!

reddit.com
u/GeneBackground4270 — 6 days ago

Do data quality frameworks have to be so complex?

Looking for feedback from fellow data engineers.

I've been building an open-source data quality framework for PySpark called SparkDQ: https://sparkdq-community.github.io/sparkdq/

The main goal is simplicity. It's Spark-native, lightweight, and lets you define data quality checks using Python configuration classes instead of external services or custom DSLs.

I'm curious:

* What's your first impression? * Would you use something like this? * What features would you expect from a framework like this?

Any honest feedback is appreciated. Thanks!

reddit.com
u/GeneBackground4270 — 6 days ago