Online Environment

Search Tools

Applied Searching

Site Evaluation

Site Submission

Home

Unit 2: Search Tools

Overview

Comparison Types Assignment Reading Resource

      

What do I need to know about search tools?

Purpose
Search tools are designed and constructed  to retrieve web sites  in response to user-generated  requests. Each search tool  creates a database or collection of web sites. The size of these databases  varies  and  the largest indexes only 30% of the 1 billion web pages -- the majority of remaining web pages exist on the Invisible Web.   Search tool survival  is largely dependent on user satisfaction. Search tool companies spend money and time   researching what  users are looking for, how they  find the answers to their questions, and services they expect.   Throughout their short life span (1994 -) search tool home pages have evolved from sparse bare pages to complex, cluttered, customizable portals and today scaled back to a minimalist design with a few links to additional features.  Search tools are promoted through  advertisements in magazines, newspapers,  television, web sites and satisfied customers.
                   

Methodology
Search tools  accomplish their purpose in two ways: automated collection and human submission.  In automated collection,  search tools like Google, AlltheWeb, and MSN use  software  called spiders or crawlers or  robots to act as collecting agents called spiders and robots to literally crawl the Internet. These programs search  the html code of behind the web sites looking for  significant terms, assessing word frequency, word  placement, and analyzing included links . The results are returned to the home indexing site and organized  according to ranking algorithms . Yahoo, DMOZ, and Librarian's Index to the Internet are directories whose databases are created, organized, and managed by information professionals. As a result, their databases tend to be smaller than databases created by automated means.

One of the most common misconceptions among searchers is thinking that they are doing a "live"  search on the World Wide Web.  Quite the contrary,  the search tool searches its own database of assembled web sites and not the live World Wide Web. As a result, it becomes increasingly important to determine the refreshment frequency of a search tool. Searchers should look for large databases that are refreshed frequently. 

When searchers find a list of results, they wonder which one is the best or how are the sites ranked. Ranking algorithms  the mathematical model used to determine relevance and used in page ranking process. Ranking factors include word order, word frequency,  location on a web page, title tags, headers, meta tags, link popularity, and frequency of update. Ranking algorithms (formulas used to rank sites)  are considered highly proprietary, vary from search tool to search tool, and are not usually available to the public. In fact, users may find that a result in  one search tool is given a higher ranking, usually in percentage form, than the same result appearing in a different search tools result list.

Google uses link popularity which is like the  New York Times Best Seller list of books, because the sites which are linked to most frequently by other sites are given first place in  Google search results. 

 

Results Ranking in Web Search Engines
   Martin P. Courtois and  Michael W. Berry Online May 1999.
Proxy required

Searching Quagmire
   Feldman, Susan  and Liddy, Elizabeth.  Searcher May 2001.
Proxy required
    

Prevention
Invisible web sites or people who don't want their sites to be found by search tools insert robots.txt in the HTML code which  prevents spidering software from accessing the site. 
                       

Components
Four parts of a search tool: 

collecting agents Software programs called spiders, robots, or crawlers, which search the HTML code of web sites.
indexing unit   Organizes  and includes the data collected in the search tool database
matchmaker Receives the user's request, searches the search tool's  index of web sites and responds with  sites that match a predetermined algorithm.
home page the user's first encounter with a search tool . It is on the home page that the user reads the on-screen directions and enters a search term or browses from the subject guides available

  

                      < top of page >

 

 

 

 

 

 

 

 

 

      For information contact jegan@nvcc.edu
      Last revision August 24, 2002