u/SynergicTech

▲ 4 r/DoSEO

Persistent www canonical preference on 5 specific new URLs after migration — 36 others flipped fine on identical signals. What am I missing?

Posting here because TechSEO seems to have a 24h+ mod queue on this and it hasn't cleared yet. Search Central Help Community thread (430811844 ) also live — boilerplate response only.

Migrated our websitefrom its Squarespace to Next.js on Vercel late March 2026. 41 URLs total. 36 are now correctly indexed against the non-www canonical. 5 are stuck — Google keeps picking the www variant on every crawl despite identical signals to the 36 that flipped fine. The site is a wholesale rewrite. The Squarespace used world wide web format.

What's live now:

- 301 host redirect (www → non-www), verified live via curl

- 35 path-level redirects also forced to 301 (Next.js's permanent: true defaults to 308 — caught and replaced 28 April)

- All chains return 301 → 301 → 200; no 308 anywhere

- Self-referencing rel="canonical" → non-www on every page

- /sitemap.xml lists non-www only, registered against the non-www GSC property

- Legacy www sitemap deregistered

- Internal-link audit: zero `www.` references in src/

- Vercel dashboard www→non-www domain redirect removed (was racing the code path)

What's worked:

- 36/41 URLs now correctly indexed at non-www canonical

- /rfid-solutions and /about flipped within hours of re-crawl on the same signal stack

What hasn't:

- 5 URLs stuck: /sales-operations-planning, /inventory-optimisation, /methodology, /environmental-monitoring, /insights/rfid-in-new-zealand

- URL Inspection on each: "Duplicate, Google chose different canonical than user" — userCanonical = non-www, googleCanonical = www

- Re-crawled five times since the host-redirect fix on 23 April; two of those were direct responses to manual Request Indexing submissions (4 May and 7 May). Google has consistently picked www on every crawl

- Most recent RI batch hit 5 of 5 within an hour — fresh crawl, same wrong canonical

- Natural re-crawl cadence on these 5 has stalled outside of manual RI — without it, 72h+ goes by with no fresh crawl while other URLs on the property are crawled normally

Working hypothesis: first-discovery asymmetry during the 308-shadowing window. 4 of these 5 are entirely new URLs (created at launch, never on the pre-migration Squarespace site); the 5th is content-migrated under a new slug. Pre-23 April the host redirect was 308 (not 301) and the canonical tag signal was weakened. If Googlebot's first encounter with any of these new URLs happened via a www variant — stray external link, partner-shared URL, anything that wasn't the sitemap — Google may have indexed www as first-seen-canonical, and that bias is now persisting through all the corrected signals.

For the 36 that flipped fine, first-discovery was probably non-www (sitemap has always been non-www only). For the 5 that didn't, first-discovery may have been www, and the bias is sticky despite forced 301s + self-canonical + multiple RI rounds + 5 corrected re-crawls.

Continuing daily RI to force fresh crawl data, but it's not changing the canonical decision.

Anyone seen first-seen-canonical bias persist this strongly after a corrected redirect chain + multiple RI-triggered re-crawls? Anything beyond the standard playbook worth trying?

reddit.com
u/SynergicTech — 12 days ago