Cyberspace Information Retrieval and Processing with the Use of the Deep Web
The Deep Web, in the most general approach, embraces this part of cyberspace content that is beyond reach of standard search engines and catalogues. These resources are primarily specific categories of databases, usually represented by HTML sites, which content is dynamically generated after a user's query. Besides, this part of cyberspace contains disconnected pages and sites, there are just no links for crawlers. Technical problems arise also when a spider encounters and object or file which is not a text document. There are four basic categories of invisibility of the Deep Web: The Opaque Web, The Private Web, The Proprietary Web, and The Truly Invisible Web. The most important reasons to utilize the Invisible Web include concentration on the more specialized subject matter, specialized interface, simultaneous increase of precision and recall, much higher expertise. Also, very often it is the only reliable source of information on a given subject. The most valuable and user-friendly search engines for the Deep Web include TURBO 10, Complete Planet, OAIster, BUBL LINK and IncyWincy. (original abstract)
- 1. Boswell W. (2007), About.com Guide to Online Research: Navigate the Web - from RSS and the Invisible Web to Multimedia and the Blogos-phere. Adams Media.
- 2. Broder A. (2000), Graph Structure in the Web. The 9th International World Wide Web Conference. Amsterdam, the Netherlands, 15-19.05.2000. http://www.almaden.ibm.com/cs/k53/www9.final, 2.10.2005
- 3. Deep Web FAQs (2011), Bright Planet, http://brightplanet.com/the-deep-web/deep-web-faqs/, 15.02.2011.
- 4. Invisible Web - co to jest? (2011), "Library News, US Embassy, Warsaw, Poland", http://www.usinfo.pl/libnews/docs/grudzien07.htm, 15.03.2011.
- 5. Sherman C, Price G. (2007), The Invisible Web: Uncovering Information Sources Search Engines Can't See. Information Today, Inc. Medford, New Jersey.