January 30, 2015
Hot Topics:

Use XML Even As It Changes

  • February 1, 2000
  • By James Bean
  • Send Email »
  • More Articles »

February 2000

Use XML Even As It Changes

Here's how you can tackle application-to-application integration needs while building a migration path to XML Schema
By James Bean

These days you can't predict what other applications an application will have to integrate with. Yet application-to-application (A2A) interfaces often focus on localized data, without regard for cross-function and cross-domain reuse of common data. The focus is now shifting to recognize architectural practices, standards, and the importance of enterprise metadata. But while these techniques help, you need more to really enable exchanging data between architected and legacy applications. If the applications are engineered like jigsaw puzzles, each app mates easily only with adjacent apps that are cut to fit. But if you can build them with pieces engineered like Tinker Toys, they become more versatile (see Figure 1). Such apps exchange information through common interfaces, yet retain their internal functional autonomy. The secret lies in building common messages using standards such as XML, rather than the syntactic or positional formats used by traditional EDI.

Figure 1. From jigsaw puzzle to Tinker Toy. Click here.

XML (Extensible Markup Language) addresses the content and metadata gaps in its infamous sibling HTML. Discrete data components are described in an XML document as elements and attributes. These are expressed via Element "tags," Attribute "tags," and the structure of the document. Data is compartmentalized into these elements and attributes according to a set of rules.

The current XML specification uses Document Type Definitions (DTDs) to describe the content and structure of an XML document. However, there's an innovation waiting in the wings: XML Schema. Schema will most likely present the best solution for describing metadata with XML. But current implementations are often based on DTDs. Schema should be adopted rather rapidly, but a number of industry-based XML vocabularies and numerous custom-developed XML DTDs will require a reasonable period of migration.

This article will show you how to address your short-term A2A integration needs with XML DTDs and how to build in a handy migration path to Schema, then preview how Schema will most likely work.

For A2A integration, you can use XML to define the contents of a message used by an interface. The sending and receiving apps can interrogate, extract, and interpret message contents by the tag, rather than by position or placement. From a broader enterprise architecture view, this lets applications leverage a common interface message framework. Functional message sets can be constructed to support exchange between many applications, rather than use the point-to-point paradigm. You design message content around enterprise domains and functions rather than specific applications. Reusable sets of common data components are clustered and standardized for use by broad sets of applications. Receiving applications extract and interpret the discrete message components they need, ignoring the balance.

True message brokers extend this scenario. They act as traffic cops and navigators between applications. Broker objects could interpret functional content from messages, then route them to desired targets. But message brokers conceal some gotchas.

Use metadata to bridge data islands
Legacy applications often lack common characteristics surrounding the data that are intrinsic to a functional domain. So you get a proliferation of data islands that aren't easily shared or reused. Often the delta between these disparate data sources is some combination of identifying name, length, and data type.

Applications that support, say, Sales Order Management, might manage product information in a relational database, with a Product Name element defined by a PROD_NM column. You could also describe this element by the metadata characteristics of data type "character" and length "30". Similarly, applications that support Marketing might also manage product information, only they're using a different relational database with a PRODUCT_GRP column. This column might be described by a data type "char" and length "15". Further complicating things might be an additional data element in the Marketing database for Product Family.

The resulting database column is PROD_FAM_NM, of data type "character" and length "15". It would surely help to be able to exchange product information between the functional domains of Sales Order Management and Marketing. But if the data were extracted from the two different databases and used for the exchange between the apps, you may well get anomalies related to data type, data value truncation, and semantic description.

In a perfect world you'd identify the disparate data sources, standardize, then re-engineer both data and applications to be fully built and made reusable (Try selling such a costly project to your management, though). Or you could identify the valuable data that's common to both domains, then create metadata characteristics that define a minimum set of rules for describing the data elements in a message context. For your legacy apps this might result in some level of re-engineering or utility wrapper development.

Rife with complexities
Of course, using metadata provides a solution rife with its own complexities. Many characteristics can be used to represent metadata, but this discussion will cover just a few of the most obvious data characteristics: name, data type, length, and decimal scale. Of these, the intended meaning of length and decimal scale would probably be consistent, anyway. But data type depends on the target application and underlying database. And name is fairly subjective. It's usually based on business or technology taxonomy standards.

As a somewhat simplified and abstract group, these characteristics comprise strong data typing. They'll generally let you identify data element by its name, the constraints of its use by data type, and the limitations of its value by the length and decimal scale. And if the marketing and sales order management applications were aware of these characteristics, message exchanges would deliver better data quality.

In native form, an XML DTD can provide some of these important characteristics. With the ability to describe data by element tags and attributes, XML and DTDs become a great candidate solution for describing the content of a message. However, DTDs are not a universal remedy. As you venture into the XML world, you will quickly learn that data type, length, and decimal scale are not intrinsic to DTD specifications. In fact, most XML data content is simply defined as "string" or "character". XML document content is defined as character data regardless of whether the origin was actually numeric.

DTDs today, Schema tomorrow
This obvious gap with XML-version DTDs should not keep you from using XML. In fact, the World Wide Web Consortium (W3C) has been working on a draft for XML Schema that should address these shortcomings. Meanwhile, we have XML version 1.0 and DTDs—along with some techniques you can use to get around the lack of strong data typing. Tim Bray proposes an elegant solution to the data typing problem created by using DTDs. He uses attributes defined to each element of the DTD and instantiated in the XML document. These attributes span several types (including #FIXED), describing metadata characteristics in support of data typing such as data type, length, scale, minimum value, and maximum value.

Bray's model needs some tweaking to scale up to high-volume production environments. First, including #FIXED attributes in the DTD for the XML document of the message can add some dependencies and overhead. Data content for #FIXED attributes is defined by default values in the DTD. You can't instantiate these fixed attributes separately within the content of an XML document.

So even though fixed attributes are defined to the DTD and describe data values for elements of the XML document, they aren't populated in it. When the DTD is used and validation enabled, the attributes' values describing metadata characteristics are available only to the document and the instantiated Document Object Model (DOM). For our purposes, think of a DOM as a set of nodes defined within a hierarchical structure. The nodes are populated based on the content and structure of the XML document. An application can then navigate the DOM to extract data, using a set of APIs.

When a high volume of messaging and element content is passing between applications, you might lack the headroom to validate to a DTD as part of the parsing process. The #FIXED attributes used to describe the data type, length, and decimal scale in an externally defined DTD would not be instantiated to either the XML document, or the resulting DOM.

As for document content overhead, even if validation weren't an issue, you still get repetitive data values for every metadata attribute applied to every element occurrence of the XML document (and the instantiated DOM). Your application can navigate the DOM and extract the metadata attribute values from each corresponding node, but the values will be the same for each element instance. If an XML document contains 1,000 instances of "Product", then the data type, length, and decimal scale values are repeated for every instance.

My morph of Bray's model tackles these complexities where some object, application, or process is being developed to interpret XML metadata characteristics and triggered to apply the corresponding rules. If not, the metadata attributes are ignored.

Scale with metadata templates
I define the metadata characteristics in Bray-like fashion, but just once in a separate XML document—a metadata template. This template becomes the metadata description for each element of the XML document that contains the interface content used for messaging. But be careful about reusing attribute tags. Though the parser I used judges the XML document to be both well-formed and valid, you should test the metadata templates thoroughly to avoid potential anomalies.

My alternative also requires providing an interface management process (or object) to map the content of the metadata template to the elements of the same tag name in the XML message document, and to apply any necessary rules or editing. You could do this by instantiating the DOM using the XML message document. Based on a processing instruction, the interface management application would map the XML document's elements to a corresponding XML metadata template.

Of course, nothing's free. Mapping, validation, and anomaly reporting/resolution need to be accomplished by the interface management process. Mapping occurs between elements of the same name as defined in the template and XML message document. So you need to weigh the overhead of this added processing against the resulting improved integration and high-quality data exchange.

As for triggering the process, you might try an XML processing instruction to invoke this supplemental metadata validation process. The process will need to build the DOM, and it will be up to an interrogating application (such as the interface management process) to identify the appropriate instruction, invoke the metadata validation process when necessary, address anomalies, and route the interface message accordingly.

My method delivers a lot of reusability. The XML metadata template document is separate from the actual data of the interface message, and is more static. So you can define the template once, reuse it, and apply it to multiple instances of A2A messages as needed.

And this separation of the interface message document from the metadata template helps you migrate to XML Schema. By separating out the metadata characteristics, the migration process doesn't have to deal with non-message content. You translate the native XML message document (and a DTD, if you use one) to a simple baseline XML Schema. You can then address the subtler needs of the metadata as an enhancement of the conversion script.

Here as before when the metadata validation process isn't enabled, the template document and validation process are ignored. This use of a separate metadata XML template document tackles the problems of high-volume message exchanges and XML Message documents with excessive element content. It also addresses some of the more obvious metadata gaps of XML version 1.0 DTDs. Though of course XML DTDs may be migrated or replaced at some point after XML Schema is ratified.

XML Schema as a solution
Odds are good that a World Wide Web Consortium recommendation for XML Schema will be accepted this year. It's currently defined by a W3C draft "XML Schema Part1: Structures" (W3C Working Draft 5, November 1999). XML Schema delivers the strong data typing DTDs lack, as described in "XML Schema Part 2: Datatypes" (W3C Working Draft 5 November 1999). It describes key metadata characteristics, including base datatypes and corresponding facets.

Base datatypes include: string, boolean, real, timeinstant, timeduration, binary, uri, language, decimal, integer, and date. Another submission to the W3C, "XML-Data", extends the notion of strong datatyping into a more richly defined datatype. This came from Microsoft, ArborText, DataChannel, Inso, and the University of Edinburgh. It proposes using many discrete datatypes, including float (real number, with no limit on digits) and fixed.14.4 (number with up to 14 digits to the left of the decimal point, and up to 4 to the right). Facets are single defining aspects of concepts or objects. I think of facets as attributive characteristics of other characteristics. For example, fundamental facets described by the W3C draft include: Order, Bounds, Cardinality, Exact and Approximate, and Numeric. Non-fundamental facets include length, maximum length, pattern, and enumeration.

Easier migration
At a more practical level, XML Schema uses XML syntax, which avoids the separate language format of a DTD and potentially eases the process of migration from XML version 1.0. Knowing that XML Schema is probably not far into the future, you need to consider some reasonable period of transition between DTDs and Schema, and a tactical process for migration. It's hard to gauge when this will happen, though. Everyone really wants Schema, but we can't just chuck all those XML vocabularies that use DTDs as their defining structure. Fortunately, the DTD-based metadata solution I've described here can help allay the need to jump into Schema before the XML vocabularies used in your industry are migrated.

A Tool For Migrating From XML DTDs To Schema

Click here
There's not a lot of maturity in the XML tool and utility market so far. But there are a few tools that can help you migrate from DTDs to Schema—and help recoup your investment in using XML and DTDs for A2A messaging. My favorite tool for conversions thus far is XML Authority, from Extensibility. It can import an XML document or a DTD, then export XML Schema (based on the current draft recommendations). It also supports some other variations, such as XML-Data and XML frameworks such as BizTalk.

Regardless of the XML recommendation currently in play, there are tremendous advantages in the A2A space for XML. Enterprise integration can go a long way in the area of metadata characteristics for messages, but you need to be smart about which solutions will benefit your organization the most. In general, it's time to start using XML to resolve some of your A2A integration and data disparity issues.

A common messaging utility, XML message, or interface broker can really help police your application integration and interface dilemmas. Just be sure to validate your concepts for feasibility, impact, performance, and cost/benefit.

And remember to capture volume and performance metrics. This will help to document the effectiveness of your solution.

James Bean sees XML as a technology that requires developers to keep their skills current. Bean is CEO of the Relational Logistics Group, which consults on reuse, globalization, e-commerce, database design, data modeling, and data standards. Reach him at RDBMS@aol.com.

© 1999 FAWCETTE TECHNICAL PUBLICATIONS, all rights reserved.

Page 1 of 4

Comment and Contribute


(Maximum characters: 1200). You have characters left.



Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel