Purpose
Search tools are designed and constructed to retrieve web sites in response to user-generated requests. Each
search tool creates a database or collection of web sites. The
size of these databases varies and the largest indexes only 30% of the 1
billion web pages -- the majority of remaining web pages exist on the
Invisible Web. Search tool survival is largely dependent on user satisfaction. Search tool companies
spend money and time researching what users are looking for, how they
find the answers to their questions, and services they expect.
Throughout their short life span (1994 -) search tool home pages
have evolved from sparse bare pages to complex, cluttered,
customizable portals and today scaled back to a minimalist design with
a few links to additional features. Search tools
are promoted through advertisements in magazines, newspapers, television,
web sites and satisfied customers. Methodology
Search tools accomplish their purpose in two ways:
automated collection and human submission. In automated collection,
search tools like Google, AlltheWeb, and MSN use software called spiders or crawlers or robots to act as collecting agents
called spiders and robots to literally crawl the Internet. These
programs search the html code of behind the web sites looking
for significant terms, assessing word frequency, word
placement, and analyzing included links . The results are returned to the home
indexing site and organized according to ranking algorithms . Yahoo,
DMOZ, and Librarian's Index to the Internet are directories whose
databases are created, organized, and managed by information
professionals. As a result, their databases tend to be smaller than
databases created by automated means.
One of the most common misconceptions among searchers is
thinking that they are doing a "live" search on the World Wide Web. Quite the contrary,
the
search tool searches its own database of assembled web sites and not the live
World Wide Web. As a result, it becomes increasingly important to determine
the refreshment frequency of a search tool. Searchers
should look for large databases that are refreshed frequently.
When searchers find a list of results,
they wonder which one is the best or how are the sites ranked. Ranking algorithms the mathematical
model used to determine relevance and used in page ranking process. Ranking factors include word order,
word frequency, location on a web page, title tags, headers,
meta tags, link popularity, and frequency of update. Ranking algorithms
(formulas used to rank sites) are considered
highly proprietary, vary from search tool to search tool, and are not usually available to the public. In fact, users may
find that a result in one search tool is given a higher ranking, usually in percentage form,
than the same result appearing in a different search tools result list.
Google uses link popularity which is like
the New York Times Best Seller list of books, because the
sites which are linked to most frequently by other sites are given first
place in Google search results.
Results Ranking in Web Search Engines
Martin P. Courtois and Michael W. Berry Online May 1999.
Proxy
required
Searching
Quagmire
Feldman, Susan and Liddy, Elizabeth. Searcher
May 2001. Proxy
required
Prevention
Invisible web sites or people who don't want their sites to be found by
search tools insert robots.txt
in the HTML
code which prevents spidering software from accessing the site.
Components
Four parts of a search tool:
| collecting agents |
Software programs called spiders, robots, or
crawlers, which search the HTML code of web sites. |
| indexing unit |
Organizes and includes the data
collected in the search tool database |
| matchmaker |
Receives the user's request, searches the
search tool's index of web sites and responds with sites that match a
predetermined algorithm. |
| home page |
the user's first encounter with a search tool
. It is on the home page that the user reads the on-screen directions and enters a search
term or browses from the subject guides available |
< top of page >
|
|