Service Oriented Architecture and Mass Data Movement
How do you deal with a SOA-style service that needs to send large payloads and return large payloads? What if, given the size of the payload, moving large volumes of data via SOAP/HTTP is not really an efficient option? Could an alternative be to use service-enabled request/response notifications while the payload data is being traded or exchanged via the data tier?
SOA is being touted as a mechanism to help an enterprise become more agile. SOA promotes the concept of loose coupling of service provider from the service consumer; this insulates the service consumer from service provider changes as long as the interface for interaction and the message format remains unchanged.
However, an enterprise may not be able to get away from having to trade large amounts of data in the request payload or the response payload especially in conditions where the service provider performs "specialized" or "utility" data processing function or data gathering function. Again, the assumption is that in these types of services the provider may not have the ability or need to host or store business data. In these scenarios, transporting large payloads via the SOAP/HTTP protocol becomes extremely resource intensive both from a network performance perspective and from a security perspective.
Overview of Architectural Components
This article discusses an architecture option for dealing with this problem domain whereby request/response notifications are exchanged in a SOA-style interaction while the payload data is exchanged using runtime data tier utilities that manage the population and access of the request/response data to and from a Staging repository. The request/response notifications being exchanged via the integration tier only hold unique request and response identifier keys.
This architecture pattern encourages the reuse of robust SOA, enabling integration tier infrastructures such as the Enterprise Service Bus (ESB). The ESB offers features such as policy-based service management, service monitoring, and provides comprehensive reporting of service SLA metrics. In addition, the integration tier affords the ability to view the end to end service invocation flow that allows operations personnel to base their service deployment policies on the service consumer usage patterns and QoS needs. Finally, the ESB infrastructure components provision enterprise worthy information to authorized consumers only without having to embed these "access rules" in the service provider.
Staging Repository—Utilization Profile and Access Mechanisms
The common architecture components include data access object (DAO pattern) utilities for inserting/reading the request and response data into the common Staging data structures. The DAO utilities could be deployed on the service consumer and the service provider address space to access the Staging repository. Also, the data structures are built in a manner conducive to the data processing needs of the service tier and may not necessarily be a third normal form using entity relationship modeling techniques. The idea is to not only avoid trading large payloads via the integration tier but is also to ensure that the access to the Staging Repository is only allowed via the code-base encapsulated in the DAO.
Given that the service provider "owns" the DAO utility and is making this utility available for deployment on the service consumer address space, the service provider now is able to alter the Staging data structures and the DAO utilities without breaking the consumer. It is important to note that decoupling of the service consumer from the service provider layer includes the Staging Area data structures as well. Without this, the service provider and the service consumer are hard wired at the data tier, thus invalidating the whole premise of SOA.
Intermediate Area: Utilization Profile and Value—Service Provider Perspective
Another concept that is discussed in the architecture pattern is the use an Intermediate Area that is akin to a response cache that stores responses as they are processed. The presence of the Intermediate Area allows the provider and the consumer to have different availability and uptime requirements. Also, the service consumer and the service provider can work somewhat independently without having the Staging Area data structures affecting the persistence activity of the service responses. The assumption here is that the Intermediate Area could be a database cache or a XML cache.
Primarily, the Intermediate Area protects the service provider by allowing the response persistence activity to proceed without being impacted by the service consumer that may have tied the Staging Area data structures in writing out its requests. This facility also allows the service provider to create a service response repository that now caches the responses for enterprise wide consumption and not just for a particular service request made by a specific service consumer. The Intermediate Area could be used for short-circuiting the processing of a service request if the response is available in cache. Another advantage of the intermediate Area cache is the ability for the service provider to process the request in a generic fashion while employing ETL based filter rules to permit authorized access to only a subset of the response information based on the service consumer type.
Intermediate Area: Utilization Profile and Value—Service Consumer Perspective
The Intermediate Area allows the service request to be submitted in a separate step while affording it the ability to delay the consumption of the service response. The service consumer can execute an ETL process on-demand to pull the service response into the Staging Area at the appropriate time, for instance when its end user is ready to review the results of a request submitted in a prior session. Additionally, the presence of the Intermediate Area cache can insulate the service consumer by allowing its end-users to continue submitting their on-line request transactions without being impacted by the response persistence activity of the service provider.
Execution Path of the Architectural Components
Here is how the architecture components coordinate and execute the call path.
A. Service Consumer
- Call DAO Utility to Write Request Data to Staging Area
- Send Request Notification to Integration Layer
B. Integration Layer
- Perform base level validation and transformation, if needed, and forward the Request Notification to the final Provider
- Call DAO Utility to Read Request from the Staging Area
- Process Request and write to the Staging Area OR
- Call DAO Utility to Write Response Data to Intermediate Area
- Send Response Notification to Integration Layer
- Send Response Notification to the Service Consumer
D. Service Consumer
- Call DAO Utility to Read Response Data from Staging Area OR
9.19.3 Execute the ETL process to pull the response from the Intermediate Area to the Staging Area
- Process Response
Figure 1: Service Consumer: Service Provider Interaction Sequence