Getting to a Semantic Web on the Internet

Creating order from information chaos with the help of the Semantic Web.
Mar 01, 2006

Paul Bleicher
The organization and accessibility of the massive amount of knowledge on the Internet can only be described as chaotic and haphazard. Most people who spend a little time on the Web will hone in over time to the good sites—e.g., for books, for auctions, and for searches. However, massive amounts of interesting, valuable, and accurate information are often missed or only found "accidentally." Perhaps this is the natural outgrowth of a system that involves the independent publication of documents (a.k.a. Web pages) by anyone, for any purpose. These Web pages only become visible by word of mouth, advertisement, or placement in Web search engines.

There is another way of organizing and retrieving information on the Web, known as the Semantic Web. It is the brainchild of Tim Berners-Lee, the person who conceived of the Web and built the first Web browser. Before diving in with definitions and details, it is worth discussing the organization and retrieval of information on the Web to date.

Organizing the Web

Any person or entity can publish a Web site, which is typically a network of interacting hyperlinked documents with links to programs and databases. Internally, each Web site has an organization that may be viewed in a site map of page titles and/or descriptions. Although there is no standard for the organization of a site, good sites post a site map and provide search capabilities for easy navigation.

Each Web site is defined by a URL (e.g.,, which is mapped to a unique IP address. There is no structure or organization of IP addresses or URLs that relates to the content of information on the site, or type of site. All that a URL defines is a method for a Web browser to find and display a particular site. Therefore, as the number of Web sites grew beyond a handful it became necessary to be able to categorize and search for them. The idea of categorization and search had already been in place on the Internet in the pre-WWW days, when the targets were ftp and telenet sites and the search engines were known as gopher, archie, and veronica.

The first attempts at categorization were hierarchical directories, by companies like Yahoo. These categorizations still exist, and are useful for particular kinds of searches. They are based on an "ontology" of the Web—categories and subcategories of Web sites organized into a hierarchical directory. Directories are still useful for some purposes, but they are limited in their usefulness because they can't possibly keep up with the proliferation of Web sites. In addition, the user must "guess" the ontology of the organizers, which is very imprecise when categorizing Web sites. For example, in the current Google directory, the company SAS Institute is listed as: /Computers/Programming/Languages/SAS, which is certainly correct, but /Business > Biotechnology and Pharmaceuticals > Pharmaceuticals > Outsourcing > Data Management or /Business/Biotechnology_and_Pharmaceuticals/Pharmaceuticals/Software/ might make more sense to many in the pharmaceutical industry.

With continued growth, the Web has moved toward a search model based on visible and hidden content on the Web page (e.g., Altavista). Most have now settled on a search engine, Google, which ranks pages via an algorithm relating to the number and quality of referral links for a particular page. While Google gives astoundingly good results, it often requires a fair amount of sifting through unrelated and unhelpful Web pages by the searcher. In addition, it can often be difficult to find rare and interesting pages that are tucked into a "corner" of the Web.

lorem ipsum