With the advent of Web Services, the Web is being populated by service providers who wish to take advantage of this large B2B space. However, there are still important security and fault-tolerance considerations, which must be addressed. One of these is the fact that the Web frequently suffers from failures, which can affect both the performance and consistency of applications run over it.
Atomic transactions are a well-known technique for guaranteeing consistency in the presence of failures . The ACID properties of atomic transactions (Atomicity, Consistency, Isolation, Durability) ensure that even in complex business applications consistency of state is preserved, despite concurrent accesses and failures.
The structuring mechanisms available within atomic transaction systems are sequential and concurrent composition of transactions. These mechanisms are sufficient if an application function can be represented as a single atomic transaction. Transactions are most suitably viewed as “short-lived” entities; they are less well suited for structuring “long-lived” application functions (e.g., running for hours, days, and so on). Long-lived atomic transactions may reduce the concurrency in the system to an unacceptable level by holding on to resources (e.g., locks) for a long time; further, if such a transaction rolls back, much valuable work could be undone.
The OASIS Business Transactions Protocol ; specified by a collaboration of several companies, has tried to address this issue. In this paper we shall first consider why traditional atomic transactions are insufficient for long-running activities and then describe how the BTP has attempted to solve these problems.
Why ACID Transactions Are Too Strong
It has long been realised that ACID transactions by themselves are not adequate for structuring long-lived applications. To ensure ACID-ity between multiple participants, a typically two consensus mechanism is required, as illustrated in Figure 1: during the first (preparation) phase, a participant must make durable any state changes that occurred during the scope of the transaction, such that these changes can either be rolled back or committed later once consensus has been agreed amongst all participants, i.e., any original state must not be lost at this point as the transaction could still roll back. Assuming no failures occurred during the first phase, in the second (commitment) phase participants may “overwrite” the original state with the first phase state.
Figure 1: Two-phase commit protocol.
However, two-phase commit is a blocking protocol: after returning the phase 1 response, each participant which returned a commit must remain blocked until it has received the coordinator’s phase 2 message. Until it receives this message, resources used by the participant are unavailable for use by other transactions. If the coordinator fails before delivery of the second phase these resources remain blocked until it recovers: all participants must see both phases of the commit protocol in order to guarantee ACID semantics.
Therefore, structuring certain activities from long-running transactions can reduce the amount of concurrency within an application or require work to be performed again. For example, there are certain classes of application where it is known that resources acquired within a transaction can be released “early”, rather than having to wait until the transaction terminates; in the event of the transaction rolling back certain compensation activities may be necessary to restore the system to a consistent state. Such compensations will typically be application specific.
The Business Transaction Protocol
Business-to-business (B2B) interactions may be complex, involving many parties, spanning many different organisations, and potentially lasting for hours or days, e.g., the process of ordering and delivering parts for a computer which may involve different suppliers, and may only be considered to have completed once the parts are delivered to their final destination. Unfortunately, for a number of reasons B2B participants simply cannot afford to lock their resources exclusively on behalf of an individual indefinitely, thus ruling out the use of atomic transactions. Therefore, the BTP effort has attempted to provide a solution to this by defining a new transactional model specifically for web services. It is important to realise that the term “transaction” in this sense does not mean atomic transaction, though ACID semantics can be obtained if required.
Consensus of Opinion
In general a business transaction requires the capability for certain participants to be structured into a consensus group, such that all of the members have the same result. Importantly, different participants within the same business transaction may belong to different consensus groups. The business logic then controls how each group completes. In this way, a business transaction may cause a subset of the groups it naturally creates to perform the work it asks, while asking the other groups to undo the work.
For example, consider the situation shown in Figure 2, where a user is booking a holiday, has provisionally reserved a flight ticket and taxi to the airport and is now looking for travel insurance. The first consensus group holds Flights and Taxi, since neither of these can occur independently. The user may then decide to visit multiple insurance sites (called A and B, in this example), and as he goes may reserve the quotes he likes. So, for example, A may quote $50, which is just within budget, but the user may want to try B just in case he can find a cheaper price, and without losing the initial quote. If the quote from B is less than that from A, the user may cancel A, while confirming both the flights and the insurance from B. Each insurance site may therefore occur within its own consensus group. This is not something that is possible when using ACID transactions.
Figure 2: Flight-booking.
The BTP consortium began by examining the experiences gained by using transactions in closely coupled environments such as CORBA and comparing and contrasting with the loosely coupled world of Web Services. They concluded that:
- Transaction information must be communicated in XML documents. How these documents are propagated is left up to the application, e.g., email or HTTP. All that BTP mandates is the format of the transaction payload, leaving it to users to define how to get it from end-point to end-point.
- Consensus between participants is extremely useful. Unfortunately, atomic transaction systems typically tie the two-phase commit protocol, which is intended purely for consensus with the requirement to retain locks and persistence for resources.
- Typically applications possess an initial (preparatory) phase, where resources are acquired on behalf of a specific user or business transaction and then either a cancellation or confirmation stage, which may come at an arbitrary time after the initial phase. Although this two phase approach may appear similar to that possessed by atomic transactions there is are important differences: (i) the cancellation stage does not imply backward compensation, as it does in atomic transactions: a participant may use forward compensation, (ii) the reservation stage does not guarantee that resources acquired will be available for confirmation later, and there may be an explicit or implicit time “confirm-by” period indicated when making the reservation.
- Multiple participants may find themselves in their preparatory phases and only a subset of these may eventually be confirmed, while the others are cancelled. For example, a user may contact multiple bookshops, reserving the same book at each before deciding which shop to finally purchase from (e.g., the one which offers the best price and delivery guarantees).
With these points in mind, we shall now present an overview of BTP and show how it copes with the world of Web Services.
Time to Choose
In a traditional transaction system, the application or user has very few verbs with which to control transactions. Typically these are “begin”, “commit” and “rollback”. When an application asks for a transaction to commit, the coordinator will execute the two-phase protocol before returning an outcome. The elapse time between the execution of the first phase and the second phase is typically milliseconds to seconds.
However, the actual two-phase protocol does not impose any restrictions on the time between executing the first and second phases. Obviously the longer this period takes the more chance there is for a failure to occur and the longer resources remain locked. BTP on the other hand, took the approach of allowing the time between these phases to be set by the application simply by expanding the verbs available to include explicit control over both phases, i.e., “prepare”, “confirm” and “cancel”. The application has complete control over when it can tell a transaction(s) to prepare and using whatever business logic is required, later determine which transaction(s) to confirm or cancel. This ability to explicitly control the termination protocol is a powerful tool.
Architecture of the Business Transaction Protocol
The BTP architecture can best be described with reference to Figure 3. Web Services do work within the scope of atoms (similar to atomic transactions), which are created by the initiator of the business transaction; multiple atoms are composed into a business transaction (e.g., arranging a holiday) by a cohesion composer such that different atoms may possess different outcomes, as directed by the business logic, e.g., cancel one insurance quote and confirm another.
Figure 3: BTP actors.
The actors involved in a BTP business transaction are:
- Coordinator (atom): used to scope work performed on Web Services. As with an atomic transaction, the coordinator is responsible for informing enlisted participants about whether they should accept (confirm) or reject (cancel) the work done within the scope of that atom.
- Initiator of the atom: communicates with an atom manager (factory) and asks it to start a new atom. Once created, information about the atom (the context) can be propagated to Web Services in order for the work to be conducted within the scope of an atom.
- Terminator of the atom: This will typically be the same entity as the initiator, but need not be. Although an atom can be instructed to confirm all participants immediately, as mentioned in Section 3.2, it is more typically instructed to prepare them first, and later (hours, days, etc.) to either confirm or cancel them.
- Web service, e.g., the taxi booking service: Whenever the initiator contacts a service whose work it wishes to be under the control of an atom, it flows the context to that service. The service can then use this information to enlist a participant with the atom. The service is responsible for ensuring that concurrent accesses by different applications are managed in a way that guarantees some internal consistency criteria for that service.
- Each participant supports a two phase termination protocol via the prepare, confirm and cancel operations. What the participant does when asked to prepare is implementation dependant (e.g., reserve the theatre ticket); it then returns an indication of whether or not it succeeded. However, unlike in an atomic transaction, the participant does not have to guarantee that it can remain in this prepared state; it may indicate that it can only do so for a specified period of time, and also indicate what action it will take (confirm or cancel) if it has not been told how to finish before this period elapses. In addition, no indication of how prepare is implemented is implied in the protocol, such that resource reservation need not occur.
- Cohesion composer: the business logic for gluing together the flow of the application into one or more atoms. Although Web Services do work within the scope of atoms, it is the cohesion composer (cohesion) that ultimately determines which atoms to confirm, and which to cancel. The composer may prepare and cancel atoms at arbitrary points during the lifetime of the business transaction. The main difference between an atom and a cohesion is that whereas all participants enrolled with an atom will either confirm or cancel, the participants enrolled with a cohesion (atoms) may have different outcomes. However, once the composer has arrived at its confirm-set (the participants that will confirm) it essentially collapses down to become an atom and guarantees an all-or-nothing effect.
Through the cohesion composer, BTP gives the business logic the flexibility to structure interactions with services into multiple consensus groups. The fact that atoms may be prepared at any point in the normal flow of business and later confirmed or cancelled, gives greater flexibility to the application.
So How Would I Use This BTP Thing?
Consider the flight-booking example presented earlier. How could we use BTP in order to coordinate this application in a reliable manner? The problem is that we wish to obtain the cheapest insurance quote as we go along and without losing prior quotes until we know that they are no longer the cheapest; at that point we will be able to release those quotes while maintaining the others. In a traditional transaction system, all of the work performed within a transaction must either be accepted (committed) or declined (rolled back); the required loosening of atomicity is not supported.
In BTP, however, we can use atoms and cohesions. A cohesion is first created to manage the overall business interactions. The business logic (application) creates an atom (ReserveAtom, say) and enrols it with the cohesion before invoking the airline and taxi reservation services within its scope, such that their work is then ultimately controlled by the outcome of the atom. When a suitable flight and taxi can be obtained, ReserveAtom is prepared to reserve the bookings for some service specific time.
Then the insurance quotes are obtained by invoking their respective services within the scope of separate atoms (AtomQuote1 and AtomQuote2, for example), which are firstly enrolled within the controlling cohesion. When the quote from the first insurance site is obtained it is obviously not know whether it is the best quote, so the business logic can prepare AtomQuote1 to maintain the quote while it then communicates with the second insurance site. If that site does not offer a better quote, the application can cancel AtomQuote2 and it now has its final confirmation set of atoms (ReserveAtom and AtomQuote1) which it can confirm.
ACID transactions have proven invaluable over the years in the construction of enterprise applications. However, they are only really suited to short duration activities executing on closely coupled applications and environments. When used in a loosely coupled environment, they prove too inflexible and restricting for many applications. The BTP has been developed to solve this problem whilst at the same time maintaining those aspects of the atomic transaction model that have proven useful. At the time of writing, there is only a single BTP implementation available, from Hewlett-Packard. However, several companies have stated that they are working on their own implementations.
- http://www.oasis-open.org/committees/business-transactions/, June 2001.
- “CORBAservices: Common Object Services Specification”, OMG Document Number 95-3-31, March 1995.
Dr. Mark Little is a Distinguished Engineer/Architect, within HP Arjuna Labs., Newcastle upon Tyne, England, where he leads the HP-TS and HP-WST teams. He is one of the primary authors of the OMG Activity Service specification, and is on the expert group for the work in J2EE (JSR 95) and leads the JSR 156 activity on an XML API for Java Transactions. He is HPs representative on the OTS Revision Task Force, and the OASIS Business Transactions Protocol specification. Before joining HP he was for over 10 years a member of the Arjuna team within the University of Newcastle upon Tyne (where he continues to have a Visiting Fellowship). His research within the Arjuna team included replication and transactions support (he is on the expert group for JSR 117), which include the construction of an OTS/JTS compliant transaction processing system.