Building a General India Government Data Pipeline - Stuck on Census 2011, Seeking Solutions
I'm building a general-purpose pipeline to ingest data from various Indian government data portals. Goal is to automate downloading and processing datasets from sources like data.gov.in, state portals, etc.
The Problem: Right now I'm stuck on Census 2011 (Primary Census Abstract) as a test case, but can't get any government data source to work reliably.
What I've Tried:
- Open Government Data Platform API - Initially worked for one district, then got completely rate-limited. Been trying for 4-5 days, API still completely blocked. Every request times out.
- IHSN Census API - Returns 403 Forbidden, endpoint deprecated.
- censusindia Official Website - SSL certificate errors, can't connect.
- Playwright/Browser Automation - Tried automating the website directly. Anti-bot protection returns "Access Denied" for all automated requests.
- State-level portals (Maharashtra, Karnataka, etc.) - Same issues or completely non-functional.
- NADA Catalog - Registration page is broken, no public API access.
What's Working:
Successfully downloaded one district (Ahmadnagar) before rate limit hit - 1,620 real Census records Pipeline architecture is sound, just need working data sources
The Core Issue:
- Indian government data portals either:
- Block all automated access (anti-bot)
- Have aggressive rate limits that never reset
- Have deprecated/broken APIs
- No reliable programmatic access
Looking For:
- Any working method to programmatically access Indian government datasets?
- Alternative official sources that allow automation?
- Anyone who has successfully built pipelines for this data?
My goal is general data pipeline automation, but Census 2011 is blocking progress. Need genuine working solutions - not Kaggle or unofficial sources.
Want me to adjust anything?