Case Study: NC DOT Database Synchronization
Challenges of Distributed Data
Distributed databases that are not in constant contact with one another cannot offer any mechanism for system-wide locking. In particular, there's nothing to stop you from inserting two rows with the same primary key on two different remote databases, thus causing a primary key conflict when those rows are uploaded to the consolidated database.
One solution is to use a composite primary key consisting of an ordinary autoincrement column plus a "database identifier" to solve the problem of autoincrement values overlapping for different databases. Multi-column primary keys irritate application developers by making foreign key definitions overly complex, so globally unique identifiers (GUIDs) are sometimes considered as a single-column alternative. GUIDs are 128-bit integers with values like '6F9619FF-8B86-D011-B42D-00C04FC964FF'; they're unique, all right, but they make for fat database indexes, and the values are awkward to deal with when debugging. Yet another common solution is to use key pools; that's where groups of unique key values are generated by the central server and downloaded for later use by individual remote databases. Key pools do work, but they're difficult to implement and administer properly, especially in a multi-level hierarchy. Plus, they generate extra network traffic.
This is very important to the Mobile Inspector application because new rows are frequently inserted in many tables on each remote database, and they all have automatically assigned or "artificial" primary keys. The following example shows the BRI_SKTCH table that is used to hold freehand sketches in a BLOB column. The partition size is set to 2,000,000, which means each inspector can create that many sketches before the partition is exhausted. The 32-bit signed integer primary key can be as large as 2,147,483,647; divide that by 2,000,000 and you get 1,073, the maximum number of databases that can use this mechanism without any primary key conflicts. That works for the Mobile Inspector; for other, larger applications, unsigned 64-bit integers are available.
CREATE TABLE BRI_SKTCH ( SKTCH_ID INTEGER NOT NULL DEFAULT GLOBAL AUTOINCREMENT ( 2000000 ), STRCTR_ID INTEGER NOT NULL, SKTCH_TTL VARCHAR ( 45 ) NOT NULL, SKTCH_DES VARCHAR ( 130 ) NULL, FILE_NM VARCHAR ( 32 ) NULL, DEL_FLG_IND BIT NOT NULL DEFAULT 0, LST_MDFD_TS TIMESTAMP NOT NULL DEFAULT TIMESTAMP, SKTCH_LOB LONG BINARY NULL, SKTCH_NBR INTEGER NOT NULL, SKTCH_SZ_AMT INTEGER NULL, PRIMARY KEY ( SKTCH_ID ) );
Another challenge of distributed data is to reduce the amount of network traffic required to keep the databases in synch. Before the advent of specialized synchronization software, application code had to be written to determine what data needed to be shipped back and forth. Sometimes this code transmitted too much data in an effort to make sure nothing was missed. Other times, the transmission time was just as long as it searched for differences that needed to be reconciled.
The MobiLink client reduces network traffic by reading the remote database transaction log to determine what must be uploaded. Only those rows that have been inserted, updated, or deleted since the last synchronization are sent. Also, if a row is modified more than once between synchronizations, only the final version is uploaded.
The MobiLink server doesn't have the same advantage when building the download stream. Because it works with different databases such as Oracle, SQL Server, and DB2, MobiLink doesn't have access to the transaction log. If you don't want to download an entire table every time, you must code an appropriate WHERE clause in the download_cursor script.
That's where another ASA feature comes into play: a timestamp column defined with the special DEFAULT TIMESTAMP attribute will be updated automatically every time the row is updated, as well as when the row is inserted. The earlier CREATE TABLE shows how each table can have a LST_MDFD_TS column that is kept up to date without having to code a trigger. The SELECT statement shown before that contains the predicate LST_MDFD_TS >= ? to limit the download result set to only those rows that have changed since the last synch. The MobiLink server automatically substitutes the "last synchronization timestamp" value for the "?" placeholder, making it unnecessary to write any custom code to keep track of when each remote database is synchronized.
My involvement with the Mobile Inspector project began in late March 2004 and continued full-time through April, with shorter visits in May and June and again in September.
During April, the SQL Anywhere schema changes were implemented, including the introduction of the GLOBAL AUTOINCREMENT primary keys, the addition of the "LST_MDFD_TS" columns, and the new tables required to support the business rules affecting synchronization. The MobiLink scripts were developed and tested during this period, using ASA as the consolidated database, and then the Oracle database was created.
In May and June, the Oracle PL/SQL version of the MobiLink scripts was created, and full three-level Oracle-ASA-ASA testing was performed.
In the meantime, development proceeded on the application, requiring another visit in September to revise the MobiLink scripts and perform extensive testing prior to rollout.
Ever since leaving North Carolin,a I've started noticing roadside crews, wondering whether they're doing an inspection. Maybe someday, I'll be able to tell by whether or not they're using mobile computers. The final shipment of tablet PCs has arrived at NC DOT, a real indication of success in a cost-conscious jurisdiction. If an application like Mobile Inspector works there, it can work anywhere.
My experience on the Mobile Inspector application reinforced two lessons learned over the years: First, it's easy to underestimate the complexity of a synchronization setup, especially when it is an integral part of a large business application. Second, synchronization at the level of relational database data simplifies the setup because it separates the synchronization logic from the application code, reduces complex data structures to simple rows and columns, and exploits basic database features like transaction commit and column default values.
About the Author
Breck Carter is principal consultant at RisingRoad Professional Services, providing consulting and support for SQL Anywhere databases and MobiLink synchronization with Oracle, DB2, SQL Server, ASE, and ASA. Breck is author of SQL Anywhere Studio 9 Developer's Guide from Wordware Publishing. He can be reached at firstname.lastname@example.org.
Page 3 of 3