Skip to content

Monday, November 30th, 2009

Yahoo Now Supports Wildcards in Robots.txt

November 2, 2006 by Gilad  
Filed under Computers

In an effort to accommodate webmasters and following a call made by Danny Sullivan at the last Search Engine Strategies, Yahoo added today wildcard support in Robots.txt.

For those of you who don’t know the wildcard “*” replaces any string of characters, and can be used in generalizing filenames, folders and sub-domains. This recent upgrade can make your life much easier if you disallow Yahoo’s Slurp from parts of your site.

‘*’ – matches a sequence of characters

You can now use ‘*’ in robots directives for Yahoo! Slurp to wildcard match a sequence of characters in your URL. You can use this symbol in any part of the URL string you provide in the robots directive. For example,

User-Agent: Yahoo! Slurp
Allow: /public*/
Disallow: /*_print*.html
Disallow: /*?sessionid

The robots directives above will:

* allow all directories that begin with ‘public’, such as ‘/public_html/’ or ‘/public_graphs/’ to be crawled
* disallow any files or directories which contain ‘_print’, such as ‘/card_print.html’ or ‘/store_print/product.html’ to be crawled
* disallow any files with ‘?sessionid’ in their URL string, such as ‘/cart.php?sessionid=342bca31’ to be crawled

Note that a trailing ‘*’ is redundant since that is existing matching behavior for Slurp. So, the following two directives are equivalent:

User-Agent: Yahoo! Slurp
Disallow: /private*
Disallow: /private

‘$’ – anchors at the end of the URL string

You can now also use ‘$’ in robots directives for Slurp to anchor the match to the end of the URL string. Without this symbol, Yahoo! Slurp would match all URLs against the directives, treating the directives as a prefix. For example:

User-Agent: Yahoo! Slurp
Disallow: /*.gif$
Allow: /*?$

The robots directives above will

* Disallow all files ending in ‘.gif’ in your entire site. Note that without the ‘$’, this would disallow all files containing ‘.gif’ in their file path
* Allow all files ending in ‘?’ to be included. This would not automatically allow files that just contain ‘?’ somewhere in the URL string

As you can see, this symbol only makes sense at the end of the string. Hence, when we see it, we assume that your directive terminates there and any characters after that symbol are ignored.

  • StumbleUpon
  • Digg
  • Facebook
  • Mixx
  • Google
  • TwitThis
  • Reddit
  • Yahoo! Buzz
  • Slashdot
  • E-mail this story to a friend!
  • BallHype
  • YardBarker

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!


About Us | Advertise with us | Blog for EveryJoe | Privacy Policy | Terms of Use
Get This Theme | Sitemap


All content is Copyright © 2005-2009 b5media. All rights reserved.