Anyone else with a large catalog dealing with crazy GSC indexing fluctuations?
We run a kitchen and bath e-commerce site with close to 900K SKUs. Out of all submitted pages, only 291,912 are indexed and 558,569 are not. Here's the breakdown of non-indexed reasons:
- Crawled - currently not indexed: 527,301
- Discovered - currently not indexed: 28,507
- Duplicate, Google chose different canonical than user: 1,296
- Soft 404: 626
- Not Found (404): 413
- Blocked by robots.txt: 233 (we allow everything except backend pages, cart, checkout, etc.)
- Server error (5xx): 176
- Page with redirect: 17
The most frustrating part is that the indexing keeps fluctuating. Pages that were indexed a week ago suddenly drop out and show up as 'Crawled - currently not indexed.' These are good pages with real product content, not thin or duplicate stuff. Then sometimes they come back, and others drop. It feels like Google is constantly reshuffling what it considers worth indexing.
Has anyone with a similarly large catalog seen this? Did anything actually move the needle for you, content depth, internal linking, pruning low-value SKUs, technical fixes? Curious whether this is just the reality of running a big catalog or if there's a pattern others have cracked.