Do data quality frameworks have to be so complex?
Looking for feedback from fellow data engineers.
I've been building an open-source data quality framework for PySpark called SparkDQ:
https://sparkdq-community.github.io/sparkdq/
The main goal is simplicity. It's Spark-native, lightweight, and lets you define data quality checks using Python configuration classes instead of external services or custom DSLs.
I'm curious:
- What's your first impression?
- Would you use something like this?
- What features would you expect from a framework like this?
Any honest feedback is appreciated. Thanks!