▲ 18 r/dataengineering
Airflow vs Mage vs Prefect vs Dagster vs ... - yes, another tech comparison post
I know there's multiple posts like this, but the most recent ones I've found are a few years old already, so I wanted to ask again. I imagine all these tools have evolved quite a bit.
I'm starting a personal project to build a web app that will serve users over the internet:
- I don't have a server to run this on, so everything will have to be cloud-based.
- I'll deploy the frontend (React) on some PaaS like fly.dev or Render.
- Some of the data to be served to users is historical data from APIs which only serve the latest data point. So I'll need a database to store that historical data. I will use some kind of managed auto-scaling database.
- There will be a backend, most likely in FastAPI+Redis.
My question: considering that I'll need some orchestration and that this might evolve to hundreds of pipelines, which technology would you recommend? And how would you keep tabs on data quality? I've used Great Expectations in the past, and I like the idea, but getting it to work seems like a real pain every time I've tried it.
If there's a recent (less than a year ago) post like this, I apologize for repeating it - I didn't find it in a few minutes of web searching.
u/HerrKaputt — 14 days ago