Role of Java in the Semantic Web
The definition of Semantic Web according to Tim Berners-Lee, the inventor of World Wide Web is: "The extension of the current web in which information is given well-defined meaning, better enabling computers and humans to work in cooperation."
Semantic Web is the abstract representation of data on the World Wide Web, based on the RDF (Resource Description Framework) standards and other standards to be defined. This is being developed by the W3C (World Wide Web Consortium), with participations from academic researchers and industrial partners. Data can be defined and linked in such a way so that there is more effective discovery, automation, integration, and reuse across different applications.
The Semantic Web is an extension of the World Wide Web, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. The first steps in incorporating the Semantic Web into the structure of the existing Web are already under way. In the near future, these developments will roll out significant new functionality as machines become much better able to process and understand the data that the Web merely displays at present. To date, the Web has developed most rapidly as a medium for documents for humans rather than for data and information that can be processed automatically. If you want something from the Web, then you have to do it manually. I take the context of "manual" to mean that if you want to look for specific information or a product to buy from the internet, for example buying a book, then you must sit at your computer searching most popular online bookstores through categories of titles that match what you want. The Semantic Web aims to make up for this manual dependency (users should rely on software to do the task autonomously), and it will be decentralized as much as possible, just like the Internet.
The key to the development of Semantic Web is Machine Intelligence. Other terms that are frequently used interchangeably with Machine Intelligence are, Machine Learning, Computational Intelligence, Soft-Computing and Artificial Intelligence. Although the five terms are used interchangeably by industries and academics, they are different branches to the researchers who are involved in these fields. Artificial Intelligence involves symbolic computation while Soft-Computing involves intensive numeric computation.
The following sub-branches of Machine Intelligence (mainly symbolic Artificial Intelligence) that are being addressed for Semantic Web:
- Knowledge Acquisitions and Representations
- Agent Systems ( or MAS - Multi Agent Systems)
Although symbolic Artificial Intelligence is currently built and developed into Semantic Web data representation, there is no doubt that software tool vendors in the future and also software developers will incorporate the Soft-Computing paradigm into it. The benefits and advantages that Soft-Computing adds to symbolic Artificial Intelligence is that it makes software applications (systems) adaptive. This means that Soft-Computing program (software) will deal and adapt to would be unforeseen input that it was (were) not built into it. This is in contrast to the non-adaptive nature of the pure symbolic Artificial Intelligence which it cannot deal or adapt to unforeseen input (stimuli).
There are a number of related Machine Intelligence JSRs (Java Specification Request) in the JCP (Java Community Process) with two are currently in public review. These JSRs are listed below:
- Java Rule Engine API : JSR-94 (Public Review : 11th September , 2002)
- Java Agent Services API : JSR-87 (Public Review : 19th May , 2002)
- Java Data Mining API : JSR-73 (Community Draft Ballot : 24th June , 2002)
As can be seen from the above list, it is only a small domain in the area of Machine Intelligence that is being extended as JSRs in the JCP. The list is expected to grow in the future as new related Machine Intelligence JSRs will be proposed to the JCP.
Different disciplines of Machine Intelligence which have existed for over fifty years were successfully applied in different areas of software applications, and it is only now that they are being applied to the internet in such extension as the Semantic Web. New branches of Machine Intelligence are being constantly developed .
Knowledge Acquisitions and Representations
Knowledge Acquisition is defined as the extraction of knowledge from different sources of information. Examples are: the extraction of knowledge from a human expert on a specific domain (such as a doctor, lawyer, or financial advisor), extracting trading rules from a stock exchange database or extracting linguistic rules from a linguistic database and so on.
Knowledge Representation is defined as the expression of knowledge in computer-tractable form, so that it can be used to help software-agents perform well. Software Agents (also called Softbot as opposed to Robot which is a mechanical device) are a software (programs) that perceive its environment through sensors and acts upon that environment through effectors. A Knowledge Representation language is defined by two aspects:
- Language Syntax: This describes the possible configurations that can constitute sentences.
- Semantics: This determines the facts in the world to which the sentences refer. Without semantics, a sentence is just an arrangement of words, or sequence of marks on a print page, that has no meaning. For example, the sentence "David is going to town" has a meaning, semantic or a fact. Without semantic, the sentence can equivalently be stated as "town to going is David" or "going to David is town" which has no meaning at all in English. Therefore, with semantics, each sentence makes a claim about the world.
For the Semantic web to function computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning. Traditional knowledge-representation systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as "fruit" or "vehicle." But central control is stifling, and increasing the size and scope of such a system rapidly becomes unmanageable. These systems usually carefully limit the questions that can be asked so that the computer can answer (reasonably) reliably- or answer at all. In avoiding such problems, traditional knowledge-representation systems generally each had their own narrow and idiosyncratic set of rules for making inferences about their data. For example, a genealogy system, acting on a database of family trees, might include the rule "a wife of an uncle is an aunt." Even if the data could be transferred from one system to another, the rules, existing in a completely different form, usually could not.
Semantic Web provides a language that expresses both data and rules for reasoning about the data and that allows rules from any existing knowledge-representation system to be exported onto the Web. eXtensible Markup Language (XML) and the Resource Description Framework (RDF) are important technologies for developing the Semantic Web. XML lets everyone create their own tags-hidden labels such as Web pages or sections of text on a page. Scripts and programs can make use of these tags in sophisticated ways, but the script writer has to know what the page writer uses each tag for. XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean. The meaning is expressed by RDF, which encodes it in sets of triples, each one being rather like the subject, verb and object of an elementary sentence. These triples can be written using XML tags. In RDF, a document makes assertions that particular things (fruit, Web pages, movies and so on ) have properties such as "is a type of" - Orange is a type of fruit or "is a kind of" - Die Hard is a kind of action movie, with certain values (Orange, Action). This sort of structure turns out to be a natural way to describe the vast majority of the data processed by machines. Subject and object are each identified by a Universal Resource Identifier (URI), just as used in a link on a Web page. (URLs, Uniform Resource Locators, are the most common type of URI.) The verbs are also identified by URIs, which enables anyone to define a new concept, a new verb, just by defining a URI for it somewhere on the Web. The triples of RDF form webs of information about related things. Because RDF uses URIs to encode this information in a document, the URIs ensure that concepts are not just words in a document but are tied to a unique definition that everyone can find on the Web.
Knowledge Engineering is defined as the: "Selecting of a Logic for building Knowledge-Base Systems, with the implementation of the Proof Theory and also the Inference of new Facts". Thus Knowledge Engineering specifies what is true, and the inference procedure figures out how to turn the facts into a solution. Since a fact is true regardless of what task one is trying to solve, knowledge bases can in principle, be reused for a variety of different tasks without modification, and this is in sharp contrasts to procedural programming where a slight modification requires a recompilation of the program. Thus Knowledge Engineering has a declarative approach.
Agent-based software engineering makes all sorts of systems and resources interoperable by providing an interface based on first-order logic. An important step in the development of a knowledge-base is to decide on a vocabulary of predicates, functions and constants, example, should Size be a function or a predicate? Would Bigness be a better name than Size? Should Small be a constant or a predicate? Once the choices have been made, the result is a vocabulary known as Ontology. Thus an Ontology is a document or file of vocabularies that formally define the relations among terms. The most typical kind of ontology for the Web has a taxonomy and a set of inference rules.
The taxonomic hierarchy defines classes of objects and relations among them. Taxonomies have been used explicitly for centuries in technical fields. For example, systematic biology aims to provide a taxonomy of all living and extinct species; library science has developed a taxonomy of all fields of knowledge, encoded as Dewey Decimal systems; tax authorities and other government departments have developed extensive taxonomies of occupations and commercial products.
First-order logic (FOL) is defined as a general-purpose representation language that is based on an ontological commitment to the existence of objects and relations in the world. FOL makes it easy to state facts about categories, either by relating objects to the categories or by quantifying over their members, for example:
- An object is a member of a category : Tomato2 is an element of category of Tomatoes.
- A category is a subclass of another category : Tomatoes is a subset of Category Fruit.
- All members of a category have some properties: For all X , where X is an element of categories Tomatoes , there implies that there are Red Tomatoes and also Round Tomatoes.
- A category as a whole has some properties: Tomatoes is an element of DomesticatedSpecies.
Since Tomatoes is a category, and is a member of DomesticatedSpecies, then DomesticatedSpecies must be a category of categories. One can have categories of categories of categories, but they are not much use. Although subclass and instance relations are the most important ones for categories, there is a need to be able to state relations between categories that are not subclasses of each other. Example, if you say that Males and Females are subclasses of Animals, then you have not said that a male cannot be a female. If two or more categories have no members in common, they are called disjoint.
The ontologies on the Web range from large taxonomies categorizing Web sites (such as on Yahoo!) to categorizations of products for sale and their features (such as on Amazon.com). For example, an address(category) may be defined as a type of location(category), and city codes(property) may be defined to apply only to locations, and so on. Classes, subclasses and relations among entities are a very powerful tool for Web use. A large number of relations can be expressed among entities by assigning properties to classes and allowing subclasses to inherit such properties.
Application of Inference rules in Ontologies provide powerful logical deductions. With ontology pages on the Web, solutions to terminology problems begin to emerge. The definitions of terms and vocabularies or XML codes used on a Web page can be defined by pointers from the page to an ontology. Different Ontologies need to provide equivalence relations (defining same meaning for all vocabularies), or otherwise there would be a conflict and confusion.
Ontologies have the ability to enhance the functioning of the Web in different ways. They can be used in a simple fashion to improve the accuracy of Web searches-the search program can look for only those pages that refer to a precise vocabularies and concepts instead of all the ones using ambiguous keywords. More advanced applications will use ontologies to relate the information on a page to the associated taxonomy hierarchies, knowledge structures and inference rules.
Page 1 of 2