Skip to content

Thursday, November 12th, 2009

Google goes Deep Web

April 23, 2008 by Juned  
Filed under Computers

Google is doing something important. It has been playing around with HTML forms. Basically they have been toying around with forms from high quality site and use it crawl for urls that correspond to the query. It takes time to digest this. Better read this post, Crawling through HTML forms, from the Google Webmaster Central Blog.

Google of course follows the discussion quicly with a mention that the experiment follows good Internet citizenry practice and this involves:

Only a few useful sites were included in the experiment

The GoogleBot strictly adhered to robots.ext, no follow, and no index directives: If the search form is forbidden the URLS that the form would get would not be crawled.

Also, Google used only GET forms and not forms that require personal information.

Google also limited the number of fetches per web site because of the potential impact several fetches would do to a particular website.

Seems to be above board.

But what is important about this experiment of Google?

First, It is a significant effort on the part of Google to mine the Hidden Web also known as the Deep Web – in other words data, information and knowledge not reachable by the search engines.

Second, If consistently successful Google might be able to add high quality content to its usual search results.

Makes one wonder what the other competitors are doing. If successful does this mean a ten-fold increase of the Information Overload?

  • StumbleUpon
  • Digg
  • Facebook
  • Mixx
  • Google
  • TwitThis
  • Reddit
  • Yahoo! Buzz
  • Slashdot
  • E-mail this story to a friend!
  • BallHype
  • YardBarker

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!


About Us | Advertise with us | Blog for EveryJoe | Privacy Policy | Terms of Use
Get This Theme | Sitemap


All content is Copyright © 2005-2009 b5media. All rights reserved.