u/Educational-Soft4493

Hosting Python ETL in Azure (Airflow / Dagster / Prefect?)

I'm looking into different option for hosting and orchestrating pipelines primarily focusing on run python scripts.

Right now, we use Azure Synapse to run daily batch load and just run synapse notebooks. However, this is overkill/costly for the simple python jobs we run and spinning up spark sessions take too long (5-15 min).

I've been looking at using Airflow, Dagster, and Prefect... they seem like solid options but I'm a little confused on how to deploy and host them in an Azure cloud environment. I have to run them on a VM? or a container app? I also don't know if these solutions are too much overhead to learn for our team size (3-5).

I also investigated just using azure functions to trigger and run ETL python jobs, but it seems like that is for just very small jobs to run (<10 min) and a little hard to manage.

Ideally all we want to accomplish is the ability to:
- write python scripts
- schedule them to run (cron)
- ability to run them adhoc (from API call) with low latency (avoid cold start)
- view run log and get alerts on failures
- hosted in Azure

I feel like I'm going in circles learning and trying different solutions. Anyone got any advice?

reddit.com
u/Educational-Soft4493 — 5 days ago