sfw/fix
Submitted URL blocked by robots.txt high

Submitted URL blocked by robots.txt

A URL listed in your sitemap is disallowed in robots.txt, so Googlebot won't crawl or index it.

What you see

Page indexing
Why pages aren't indexed
Submitted URL blocked by robots.txt

What’s actually happening

You submitted the URL in your sitemap, signalling that you want it indexed, but a robots.txt `Disallow` rule blocks Googlebot from crawling it — two contradictory instructions. The page stays out of the index entirely. This differs from 'Indexed, though blocked by robots.txt,' where Google indexes a URL-only entry from external links despite the block; here the URL never makes it in at all. The conflict between sitemap and robots.txt is exactly what triggers this specific report.

Common causes

  • A `Disallow:` rule that is broader than intended and catches the submitted path as collateral
  • A development-era block like `Disallow: /` that was never removed after launch, or one that carried over from staging
  • A path-prefix collision — e.g., `Disallow: /products` also blocking `/products-guide/` because it has no trailing slash
  • Your sitemap auto-includes URLs (faceted/filter pages, `/search`) that an old robots rule deliberately disallows
  • A robots.txt served per-environment where production accidentally inherited the staging version

How to fix it

  1. Find the matching ruleOpen the URL Inspection tool for the blocked URL — it names the exact robots.txt line responsible. Or paste your robots.txt and the URL into a robots tester to see which `Disallow` matches.
  2. Decide: should this URL be indexed?If yes, fix robots.txt. If no, the real fix is removing it from your sitemap, since a sitemap should only list pages you want crawled. Don't leave the two in conflict.
  3. Narrow or remove the DisallowEdit `/robots.txt` so the rule no longer matches. Add a trailing slash to scope it (`Disallow: /products/` instead of `/products`), or delete a stale `Disallow: /` left from development.
  4. Use noindex instead when you want it crawled but hiddenTo keep a page out of results while still letting Google read it, remove the robots.txt block and add a `noindex` meta tag or X-Robots-Tag header instead — robots.txt blocks crawling, which prevents Google from ever seeing a noindex directive.
  5. Validate the fixConfirm `https://yoursite.com/robots.txt` no longer blocks the path, then hit Validate Fix in the Page Indexing report so Google re-checks the affected URLs.

Stop it recurring

Keep robots.txt under version control with environment-specific files so the staging Disallow: / can never ship to production.

Related errors