u/Few-Score-3828

▲ 3 r/dataengineersindia+1 crossposts

Building a General India Government Data Pipeline - Stuck on Census 2011, Seeking Solutions

I'm building a general-purpose pipeline to ingest data from various Indian government data portals. Goal is to automate downloading and processing datasets from sources like data.gov.in, state portals, etc.

The Problem: Right now I'm stuck on Census 2011 (Primary Census Abstract) as a test case, but can't get any government data source to work reliably.

What I've Tried:

  1. Open Government Data Platform API - Initially worked for one district, then got completely rate-limited. Been trying for 4-5 days, API still completely blocked. Every request times out.
  2. IHSN Census API - Returns 403 Forbidden, endpoint deprecated.
  3. censusindia Official Website - SSL certificate errors, can't connect.
  4. Playwright/Browser Automation - Tried automating the website directly. Anti-bot protection returns "Access Denied" for all automated requests.
  5. State-level portals (Maharashtra, Karnataka, etc.) - Same issues or completely non-functional.
  6. NADA Catalog - Registration page is broken, no public API access.

What's Working:

Successfully downloaded one district (Ahmadnagar) before rate limit hit - 1,620 real Census records Pipeline architecture is sound, just need working data sources

The Core Issue:

  • Indian government data portals either:
  • Block all automated access (anti-bot)
  • Have aggressive rate limits that never reset
  • Have deprecated/broken APIs
  • No reliable programmatic access

Looking For:

  • Any working method to programmatically access Indian government datasets?
  • Alternative official sources that allow automation?
  • Anyone who has successfully built pipelines for this data?

My goal is general data pipeline automation, but Census 2011 is blocking progress. Need genuine working solutions - not Kaggle or unofficial sources.

Want me to adjust anything?

reddit.com
u/Few-Score-3828 — 12 days ago