sfw/fix
Indexed, though blocked by robots.txt high

Indexed, though blocked by robots.txt

A robots.txt Disallow blocks crawling, but Google indexed the URL anyway from external links, leaving you unable to apply noindex.

What you see

Page indexing > Improve page appearance
Indexed, though blocked by robots.txt
Validation: Not started

What’s actually happening

The URL shows up in Google's index with a warning, and the search snippet often reads "No information is available for this page" because Googlebot was never allowed to fetch the body. You blocked it in robots.txt expecting it to stay out of results — it didn't. Other sites linking to the URL gave Google enough to index it from the anchor text and URL alone. The catch: a noindex tag inside that page can never take effect, because Google can't crawl the page to read the tag.

Common causes

  • A Disallow rule in robots.txt covers a URL that external sites link to, so Google indexes it without crawling.
  • Someone added Disallow to keep a page out of search, not realizing blocking the crawl also blocks the noindex directive from being seen.
  • A noindex meta tag or X-Robots-Tag header exists on the page but is unreachable because robots.txt prevents the fetch.
  • Internal staging or parameter URLs got linked publicly while sitting behind a Disallow.

How to fix it

  1. Decide: should this URL be indexed or not?If it should be indexed, the warning is harmless — just remove the Disallow so Google can crawl it fully. The rest of these steps are for URLs you want OUT of the index.
  2. Remove the Disallow so the noindex can be readDelete or narrow the robots.txt rule blocking the URL. Counterintuitive, but you must allow the crawl for Google to ever see your noindex and drop the page. Then add <meta name="robots" content="noindex"> or an X-Robots-Tag: noindex header.
  3. For pages that must stay uncrawlable, use the Removals toolIf you genuinely cannot open the crawl, submit a temporary removal in Search Console (Removals > Temporary Removals). It hides the URL for ~6 months while you sort out a permanent noindex or password protection.
  4. Protect truly private content with auth, not robots.txtrobots.txt is a crawl suggestion, not access control. Anything sensitive belongs behind HTTP auth or a login so it returns 401/403 to bots and humans alike.
  5. Validate and wait for reprocessingAfter the crawl is open and noindex is in place, hit Validate Fix. Google must recrawl each URL to see the directive, so the report clears gradually over weeks.

Stop it recurring

Never use robots.txt Disallow to deindex a page — allow the crawl and serve noindex instead, since a blocked page can't reveal its own directives.

Related errors