Back to article

The Role of Open Source in Data Integration

December 4, 2009

Emerging mechanisms for transporting generalized blocks of data rather than just documents (which can be thought of as one form of data) are becoming collectively known either as Data as a Service (DaaS) or Data Integration (DI). DI systems emerged as an outgrowth of primarily open source tools and toolsets, because the DI approach is a logical succession to the syndication architecture used by RSS. The key is the use of a data set consisting of metadata about the collection of resources, along with individual entries with associated publishing metadata blocks, possible containment of the actual data in those blocks, and tone or more links to the producer of that "blob" of data (whether static file, converted database entry, or XML resource).

For a number of reasons, this completely vendorless solution spread quickly through the Internet as a preferred mechanism for data architecture. Open Source solutions in general tend to thrive best in environments where there is comparatively minimal differentiation possible. As such, there is comparatively little cost benefit to transitioning from one service provider to another.

Yet the advantages of a Data Integration approach over Enterprise Application Integration are considerable. Most EAI systems assume some kind of underlying transactional operation, which in turn affects how the data is presented and passed. In many cases, the operation actually sees the role of the messaging format (SOAP with potential enclosures) as simply a mediator between method distributed method calls, to be converted back into binary objects at the other end.

The DI/DaaS approach, on the other hand, assumes a CRUD (create, read, update delete) orientation on collections of resources, and lets the consumer of the passed information actually perform the relevant processing. This means that it does not become necessary for DI/DaaS systems to physically retain the information in local data stores, and as a consequence it significantly reduces the "siloization" that is such a hallmark of complex EAI (especially SOA) systems.

In the long term, the competitive advantage that open software has in this space, combined with the underlying difficulty in establishing differentiating barriers and transactional gradients that are characteristic of large scale "integrated" systems, means that it is unlikely that commercial proprietary solutions will significantly challenge the existing open software market in the Data Integration space. That doesn't mean that there isn't some potential for this: Static CRUD solutions often have significant limitations of their own that suggest that dynamic CRUD solutions, RESTful services around non-traditional data abstractors (such as the XQuery language), and data repositories will likely provide a thin but sufficient layer to make DI/DaaS both powerful and cost effective.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date