The organization and accessibility of the massive amount of knowledge on the Internet can only be described as chaotic and
haphazard. Most people who spend a little time on the Web will hone in over time to the good sites—e.g., amazon.com for books,
ebay.com for auctions, and google.com for searches. However, massive amounts of interesting, valuable, and accurate information
are often missed or only found "accidentally." Perhaps this is the natural outgrowth of a system that involves the independent
publication of documents (a.k.a. Web pages) by anyone, for any purpose. These Web pages only become visible by word of mouth,
advertisement, or placement in Web search engines.
There is another way of organizing and retrieving information on the Web, known as the Semantic Web. It is the brainchild
of Tim Berners-Lee, the person who conceived of the Web and built the first Web browser. Before diving in with definitions
and details, it is worth discussing the organization and retrieval of information on the Web to date.
Organizing the Web
Any person or entity can publish a Web site, which is typically a network of interacting hyperlinked documents with links
to programs and databases. Internally, each Web site has an organization that may be viewed in a site map of page titles and/or
descriptions. Although there is no standard for the organization of a site, good sites post a site map and provide search
capabilities for easy navigation.
Each Web site is defined by a URL (e.g., 184.108.40.206), which is mapped to a unique IP address. There is no structure or
organization of IP addresses or URLs that relates to the content of information on the site, or type of site. All that a URL
defines is a method for a Web browser to find and display a particular site. Therefore, as the number of Web sites grew beyond
a handful it became necessary to be able to categorize and search for them. The idea of categorization and search had already
been in place on the Internet in the pre-WWW days, when the targets were ftp and telenet sites and the search engines were
known as gopher, archie, and veronica.
The first attempts at categorization were hierarchical directories, by companies like Yahoo. These categorizations still exist,
and are useful for particular kinds of searches. They are based on an "ontology" of the Web—categories and subcategories of
Web sites organized into a hierarchical directory. Directories are still useful for some purposes, but they are limited in
their usefulness because they can't possibly keep up with the proliferation of Web sites. In addition, the user must "guess"
the ontology of the organizers, which is very imprecise when categorizing Web sites. For example, in the current Google directory,
the company SAS Institute is listed as: /Computers/Programming/Languages/SAS, which is certainly correct, but /Business >
Biotechnology and Pharmaceuticals > Pharmaceuticals > Outsourcing > Data Management or /Business/Biotechnology_and_Pharmaceuticals/Pharmaceuticals/Software/
might make more sense to many in the pharmaceutical industry.
With continued growth, the Web has moved toward a search model based on visible and hidden content on the Web page (e.g.,
Altavista). Most have now settled on a search engine, Google, which ranks pages via an algorithm relating to the number and
quality of referral links for a particular page. While Google gives astoundingly good results, it often requires a fair amount
of sifting through unrelated and unhelpful Web pages by the searcher. In addition, it can often be difficult to find rare
and interesting pages that are tucked into a "corner" of the Web.