Friday, April 11, 2008

Proposed Deeper Web Crawl

This addresses aspects of the so-called 'hidden web', parts of the web that are hidden behind sign-ons and forms and have not been crawled. I guess this is not suggesting that Google will somehow be able to break through sign-ins and passwords, but rather be able to go 'deeper' where an arbitrary choice or form needs to be filled out. A kind of simple intelligence at that point, perhaps human-aided, could break through below the input form. How much is there and what value would it provide then comes to mind:
Google aims to penetrate the Deep Web with HTML forms crawling ... In a move aimed at taking the search-engine giant closer to what's commonly called the "Deep Web," Google Inc. today said that it has started experimenting to find ways for its search engine to index HTML forms such as drop-down boxes and select menus ... ."

