Definition
The definition of Semantic Web according to Tim Berners-Lee, the inventor of World Wide Web
is: "The extension of the current web in which information is given well-defined meaning, better
enabling computers and humans to work in cooperation."
Introduction
Semantic Web is the abstract representation of data on the World Wide Web, based on the RDF (Resource Description
Framework) standards and other standards to be defined. This is being developed by the W3C (World Wide Web Consortium),
with participations from academic researchers and industrial partners. Data can be defined and linked in such a way so
that there is more effective discovery, automation, integration, and reuse across different applications.
The majority of today's World Wide Web's content is designed for humans to read and understand, not for machines and computer
programs to manipulate meaningfully. Computers can adeptly parse Web pages for layout and routine processing but, in general,
machines have no reliable way to process the semantics. The Semantic Web will bring structure to the meaningful content
of Web pages, where software agents roaming from page to page or from site to site can readily carry out automated sophisticated
tasks for users.
The Semantic Web is an extension of the World Wide Web, in which information is given well-defined meaning, better enabling
computers and people to work in cooperation. The first steps in incorporating the Semantic Web into the structure of the
existing Web are already under way. In the near future, these developments will roll out significant new functionality as
machines become much better able to process and understand the data that the Web merely displays at present. To date, the Web
has developed most rapidly as a medium for documents for humans rather than for data and information that can be processed
automatically. If you want something from the Web, then you have to do it manually. I take the context of "manual" to mean
that if you want to look for specific information or a product to buy from the internet, for example buying a book, then you
must sit at your computer searching most popular online bookstores through categories of titles that match what you want.
The Semantic Web aims to make up for this manual dependency (users should rely on software to do the task autonomously), and it
will be decentralized as much as possible, just like the Internet.
Machine Intelligence
The key to the development of Semantic Web is Machine Intelligence. Other terms that are frequently used interchangeably
with Machine Intelligence are, Machine Learning, Computational Intelligence, Soft-Computing and Artificial Intelligence.
Although the five terms are used interchangeably by industries and academics, they are different branches to the researchers
who are involved in these fields. Artificial Intelligence involves symbolic computation while Soft-Computing involves
intensive numeric computation.
The following sub-branches of Machine Intelligence (mainly symbolic Artificial Intelligence) that are being addressed for Semantic Web:
- Knowledge Acquisitions and Representations
- Agent Systems ( or MAS - Multi Agent Systems)
- Ontology
Although symbolic Artificial Intelligence is currently built and developed into Semantic Web data representation, there
is no doubt that software tool vendors in the future and also software developers will incorporate the Soft-Computing paradigm
into it. The benefits and advantages that Soft-Computing adds to symbolic Artificial Intelligence is that it makes software
applications (systems) adaptive. This means that Soft-Computing program (software) will deal and adapt to would be unforeseen
input that it was (were) not built into it. This is in contrast to the non-adaptive nature of the pure symbolic
Artificial Intelligence which it cannot deal or adapt to unforeseen input (stimuli).
There are a number of related Machine Intelligence JSRs (Java Specification Request) in the JCP (Java Community Process) with two are
currently in public review. These JSRs are listed below:
- Java Rule Engine API : JSR-94 (Public Review : 11th September , 2002)
- Java Agent Services API : JSR-87 (Public Review : 19th May , 2002)
- Java Data Mining API : JSR-73 (Community Draft Ballot : 24th June , 2002)
As can be seen from the above list, it is only a small domain in the area of Machine Intelligence that is being extended as JSRs
in the JCP. The list is expected to grow in the future as new related Machine Intelligence JSRs will be proposed to the JCP.
Different disciplines of Machine Intelligence which have existed for over fifty years were successfully applied in different areas of
software applications, and it is only now that they are being applied to the internet in such extension as the Semantic Web. New
branches of Machine Intelligence are being constantly developed .
Knowledge Acquisitions and Representations
Knowledge Acquisition is defined as the extraction of knowledge from different sources of information. Examples
are: the extraction of knowledge from a human expert on a specific domain (such as a doctor, lawyer, or financial advisor),
extracting trading rules from a stock exchange database or extracting linguistic rules from a linguistic database and so on.
Knowledge Representation is defined as the expression of knowledge in computer-tractable form, so that it can be used
to help software-agents perform well. Software Agents (also called Softbot as opposed to Robot which is a mechanical device)
are a software (programs) that perceive its environment through sensors and acts upon that environment through
effectors. A Knowledge Representation language is defined by two aspects:
- Language Syntax: This describes the possible configurations that can constitute sentences.
- Semantics: This determines the facts in the world to which the sentences refer. Without semantics, a sentence
is just an arrangement of words, or sequence of marks on a print page, that has no meaning. For example, the sentence
"David is going to town" has a meaning, semantic or a fact. Without semantic, the sentence can equivalently be
stated as "town to going is David" or "going to David is town" which has no meaning at all in English. Therefore,
with semantics, each sentence makes a claim about the world.
For the Semantic web to function computers must have access to structured collections of information and sets of
inference rules that they can use to conduct automated reasoning. Traditional knowledge-representation systems typically
have been centralized, requiring everyone to share exactly the same definition of common concepts such as "fruit" or "vehicle."
But central control is stifling, and increasing the size and scope of such a system rapidly becomes unmanageable. These systems
usually carefully limit the questions that can be asked so that the computer can answer (reasonably) reliably- or answer at
all. In avoiding such problems, traditional knowledge-representation systems generally each had their own narrow and
idiosyncratic set of rules for making inferences about their data. For example, a genealogy system, acting on a database
of family trees, might include the rule "a wife of an uncle is an aunt." Even if the data could be transferred from one
system to another, the rules, existing in a completely different form, usually could not.
Semantic Web provides a language that expresses both data and rules for reasoning about the data and that allows
rules from any existing knowledge-representation system to be exported onto the Web. eXtensible Markup Language (XML)
and the Resource Description Framework (RDF) are important technologies for developing the Semantic Web. XML lets
everyone create their own tags-hidden labels such as Web pages or sections of text on a page. Scripts and programs
can make use of these tags in sophisticated ways, but the script writer has to know what the page writer uses each
tag for. XML allows users to add arbitrary structure to their documents but says nothing about what the structures
mean. The meaning is expressed by RDF, which encodes it in sets of triples, each one being rather like the subject,
verb and object of an elementary sentence. These triples can be written using XML tags. In RDF, a document makes
assertions that particular things (fruit, Web pages, movies and so on ) have properties such
as "is a type of" - Orange is a type of fruit or "is a kind of" - Die Hard is a kind of action movie,
with certain values (Orange, Action). This sort of structure turns out to be a natural way to describe
the vast majority of the data processed by machines. Subject and object are each identified by a Universal
Resource Identifier (URI), just as used in a link on a Web page. (URLs, Uniform Resource Locators, are
the most common type of URI.) The verbs are also identified by URIs, which enables anyone to define a
new concept, a new verb, just by defining a URI for it somewhere on the Web. The triples of RDF form webs
of information about related things. Because RDF uses URIs to encode this information in a document, the
URIs ensure that concepts are not just words in a document but are tied to a unique definition that everyone
can find on the Web.
General Ontology
Knowledge Engineering is defined as the: "Selecting of a Logic for building Knowledge-Base Systems, with the
implementation of the Proof Theory and also the Inference of new Facts". Thus Knowledge Engineering specifies
what is true, and the inference procedure figures out how to turn the facts into a solution. Since a fact is true
regardless of what task one is trying to solve, knowledge bases can in principle, be reused for a variety of different
tasks without modification, and this is in sharp contrasts to procedural programming where a slight modification
requires a recompilation of the program. Thus Knowledge Engineering has a declarative approach.
Agent-based software engineering makes all sorts of systems and resources interoperable by providing an interface
based on first-order logic. An important step in the development of a knowledge-base is to decide on a vocabulary
of predicates, functions and constants, example, should Size be a function or a predicate? Would Bigness be a better
name than Size? Should Small be a constant or a predicate? Once the choices have been made, the result is a vocabulary
known as Ontology. Thus an Ontology is a document or file of vocabularies that formally define the relations among
terms. The most typical kind of ontology for the Web has a taxonomy and a set of inference rules.
The taxonomic hierarchy defines classes of objects and relations among them. Taxonomies have been used explicitly for
centuries in technical fields. For example, systematic biology aims to provide a taxonomy of all living and extinct
species; library science has developed a taxonomy of all fields of knowledge, encoded as Dewey Decimal systems; tax
authorities and other government departments have developed extensive taxonomies of occupations and commercial products.
First-order logic (FOL) is defined as a general-purpose representation language that is based on an ontological
commitment to the existence of objects and relations in the world. FOL makes it easy to state facts about categories,
either by relating objects to the categories or by quantifying over their members, for example:
- An object is a member of a category : Tomato2 is an element of category of Tomatoes.
- A category is a subclass of another category : Tomatoes is a subset of Category Fruit.
- All members of a category have some properties: For all X , where X is an element
of categories Tomatoes , there implies that there are Red Tomatoes and also Round Tomatoes.
- A category as a whole has some properties: Tomatoes is an element of DomesticatedSpecies.
Since Tomatoes is a category, and is a member of DomesticatedSpecies, then DomesticatedSpecies must be a category
of categories. One can have categories of categories of categories, but they are not much use. Although subclass and
instance relations are the most important ones for categories, there is a need to be able to state relations between
categories that are not subclasses of each other. Example, if you say that Males and Females are subclasses of Animals,
then you have not said that a male cannot be a female. If two or more categories have no members in common, they are
called disjoint.
The ontologies on the Web range from large taxonomies categorizing Web sites (such as on Yahoo!) to categorizations
of products for sale and their features (such as on Amazon.com). For example, an address(category) may be defined as
a type of location(category), and city codes(property) may be defined to apply only to locations, and so on. Classes,
subclasses and relations among entities are a very powerful tool for Web use. A large number of relations can be
expressed among entities by assigning properties to classes and allowing subclasses to inherit such properties.
Application of Inference rules in Ontologies provide powerful logical deductions. With ontology pages on the Web,
solutions to terminology problems begin to emerge. The definitions of terms and vocabularies or XML codes used on a Web
page can be defined by pointers from the page to an ontology. Different Ontologies need to provide equivalence
relations (defining same meaning for all vocabularies), or otherwise there would be a conflict and confusion.
Ontologies have the ability to enhance the functioning of the Web in different ways. They can be used in a simple
fashion to improve the accuracy of Web searches-the search program can look for only those pages that refer to a precise
vocabularies and concepts instead of all the ones using ambiguous keywords. More advanced applications will use ontologies
to relate the information on a page to the associated taxonomy hierarchies, knowledge structures and inference rules.