sfw/fix
Faceted nav URL explosion high

Faceted Navigation Crawling Near-Infinite URLs

Filter and sort parameters multiply into millions of crawlable URLs, burning crawl budget on duplicates while real pages crawl slowly.

What you see

Crawl stats: surge in crawled URLs, flat indexed count
/shoes?color=red&size=9&sort=price&page=3
/shoes?size=9&color=red&page=3&sort=price
Googlebot crawling parameter combinations endlessly

What’s actually happening

An e-commerce or listing site adds filters — color, size, brand, price, sort — and every combination becomes its own URL. Five filters with a handful of options each multiply into millions of crawlable permutations, most of them duplicate or near-empty result sets. Googlebot pours its crawl budget into these low-value combinations, so new products and updated pages get discovered and refreshed slowly. Google names faceted navigation as a leading cause of crawl problems on large sites. In Crawl Stats you'll see crawl requests climbing while the indexed count stays flat.

Common causes

  • Every filter generates a crawlable, indexable URL, and filters combine multiplicatively — 5 facets quickly become millions of permutations.
  • Parameter order isn't normalized, so ?color=red&size=9 and ?size=9&color=red are crawled as two separate URLs for the same result set.
  • Sort and view parameters (?sort=price, ?view=grid) create duplicate content that differs only in ordering.
  • Filter links are plain crawlable <a href> tags with no nofollow or crawl control, so Googlebot walks every combination it finds.
  • Empty or near-empty facet combinations ('red + size 4 + brand X') return thin pages that still get crawled and sometimes indexed.

How to fix it

  1. Map the real scope firstPull Crawl Stats in Search Console and grep your server access logs for Googlebot hits against parameterized URLs. You need to see which facets and parameters are eating the budget before you decide what to block versus canonicalize — the fix differs per facet.
  2. Canonicalize value-add facets, block the restSplit facets in two. Ones that create genuinely useful landing pages (e.g. /shoes/running as a category someone searches) get a self-canonical and stay indexable. Pure refinements (sort order, view mode, redundant combinations) get a canonical pointing back at the base category, or get disallowed in robots.txt so Google never crawls them.
  3. Stop the crawl paths at the sourceAdd rel="nofollow" to filter links you don't want followed, or render filters via a mechanism Googlebot won't crawl into (e.g. POST, or JS that doesn't expose every combination as a unique GET URL). Cutting the links is more effective than cleaning up after the crawl.
  4. Normalize parameter order and drop emptiesEmit filter parameters in one fixed order so ?color=red&size=9 is the only form that exists, and return a noindex or a clean 'no results' state for empty combinations so they don't get crawled and indexed as thin pages.
  5. Tighten robots.txt deliberately, then watch the logsUse Disallow rules for the parameters you've decided Google should never crawl (e.g. Disallow: /*?sort=). Robots-blocking is for crawl control, not deindexing — don't block URLs you also want consolidated via canonical. Re-check Crawl Stats over the following weeks to confirm budget shifts toward real content.

Stop it recurring

Decide per facet at build time which combinations are indexable landing pages and which are blocked refinements, and normalize parameter order so one result set never spawns multiple URLs.

Related errors