Java, RDF, and the "Virtual Web" Part one: An introduction
Today, exponential expansion of the amount and heterogeneity of information is compounded by the growing variety of end-user devices and browsers. In addition, increasingly popular personalization and syndication applications present sophisticated dynamic customization requirements. All this makes it practically impossible for application developers to keep up by constantly rewriting custom server-side and browser-side code. At the core of the problem is the inability of Web applications to "understand" content. Learning to understand information about content, or metadata, is a major step toward developing a solution.
The RDF specification
The Resource Description Framework (RDF) is a standard that was designed to enable Web applications, which depend on machine-understandable metadata, and to support interoperability between such applications. It targets a number of important areas that include resource discovery, intelligent software agents, content rating, intellectual property rights, and privacy preferences. RDF is used to create models of metadata that may be understood by processing agents. It is complementary to XML, which is used to encode and transport RDF models. XML does not have an exclusive on representing RDF models; other mechanisms may be used to serve the same purpose in the future.
The World Wide Web Consortium (W3C) has released RDF as an official recommendation and is distributing a simple RDF compiler that is, of course, implemented in Java. The compiler reads XML documents encoding RDF models and converts them to so-called triples an internal representation used by RDF. Triples may be accessed and modified by RDF applications, which may later encode all or some of them back in XML for transfer to other applications.
A simple example
Consider a very simple XML-encoded RDF model describing type and subject of a Web resource:
RDF model describing type and subject of a Web resource
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:DC="http://purl.org/dc/elements/1.0/"> <rdf:Description about="http://www.cs.rutgers.edu/~shklar/"> <DC:Type>Homepage</DC:Type> <DC:Subject>Rutgers, Internet, Leon</DC:Subject> </rdf:Description> </rdf:RDF>
prefixes are bound to URIs (Universal Resource Identifiers) for "RDF Syntax" and "Dublin Core" namespaces, correspondingly. Prefixes are used throughout the RDF specification to disambiguate elements and attributes. The first element of the specification is rdf:Description, with an attribute defining http://www.cs.rutgers.edu/~shklar/ as the subject of the description. Next, the DC:Type element identifies the subject resource as a homepage, and DC:Subject lists keywords that best describe the resource.
Provided with the specification above, the RDF compiler generates triples of the form:
triple('http://purl.org/dc/elements/1.0/Type', 'http://www.cs.rutgers.edu/~shklar/', 'Homepage'). triple('http://purl.org/dc/elements/1.0/Subject', 'http://www.cs.rutgers.edu/~shklar/', 'Rutgers, Internet, Leon').
Generated triples may be visualized as follows:
Applications that can benefit even from this very simplistic model include search engines, applications aggregating content from different sources (aggregation servers), dynamic syndication servers, personalization proxies, and so forth. For example, suppose an aggregation server had to create a catalog of homepages at Rutgers of people that have something to do with the Internet. The simplest way to compile such a catalog is to only consider resources of Type "Homepage" and select those that list "Internet" as one of Subject keywords.
Extending the model is the best way to enable new functionality. For example, consider a Style property with values ranging over names bound to both style specifications and Java classes. Style specifications describe content and layout of HTML (or XML) pages, and Java classes are responsible for splitting these pages into components. Such property would not only enable the aggregation server to list hyperlinks but also to automatically follow these hyperlinks, extract photographs or brief bios, and include them in custom catalogs. More generally, we can declaratively (without writing code) refer to components of reachable pages and dynamically combine them into new aggregates (catalogs, summaries, personalized newspapers, etc.).
RDF applications are designed to understand declarative specifications (see the example) that may be created either by hand or automatically. Software modules can be designed to generate RDF specifications using input from other modules, application administrators, or even directly from end-users (e.g., through simple fill-out fields and check-boxes in HTML forms, as in http://my.netscape.com). New technologies that are currently under development will provide generic RDF generators and make it even easier to develop RDF applications.
An RDF application may be implemented either as a module that reads triples generated by the compiler or include classes responsible for parsing XML-encoded RDF models and converting them to an internal representation. For interoperability, RDF applications need classes for converting the internal representation back to triples and/or XML-encoded RDF models. A model may miss required properties or violate consistency constraints, as defined in its schema. Schema-imposed constraints may be verified either by the compiler or by the application, based on the internal representation of the model. The RDF Schema Specification is currently a proposed recommendation by W3C and is open for comments.
RDF provides a foundation for next-generation technologies that hold great promise. To coin a buzzword, it's our path into the new and exciting world of the Virtual Web. Think of a Virtual Web site as a network of metadata objects implementing an RDF model, and of a Virtual Web Server as an open Java framework that includes both generic and application-specific classes, which are used to generate custom responses. Every request is processed in the context of a metadata node that may reference local or remote content and contain processing clues (e.g., which presentation templates to use for what requests, etc.). As RDF-based technologies mature, it will become increasingly simple to create new applications by generating RDF models and editing templates. That's when it will really take off.
About the author
Leon Shklar holds a Ph.D. in computer science from Rutgers University, New Brunswick, N.J. He is the director of R&D at Information Architects Corp. (IA), Hoboken, N.J. IA's Metaphoria Virtual Web Server is the first commercial product that employs RDF models to construct sophisticated content aggregation and syndication solutions for the Internet.