Data Synchronization

Making your smart client application work in a disconnected environment - considering data loading and return synchronization.

Making your smart client application work in a disconnected environment

Let’s say you’ve decided to build a smart client application. You’ve determined that it will run on Microsoft SQL Server and employ a data store solution that the client will use in disconnected mode. Your next step will be deciding how the two data storage mechanisms will interact with one another.

This article explains the design components of a smart client application, and focuses on the data synchronization controller, which is responsible for communication between the client-side disconnected data store and the server-side connected data store.

Architecture: key components

The key design components of your smart client application are the data access and data synchronization controllers.

The data access controller is based on, and works much like, the class factory pattern. A request for an object is posted first. The data access controller retrieves the data, instantiates the object with the data, and returns the object to the requestor. When the data is modified, the object is posted back to the data access controller, which processes it against the data store and notifies the caller that the data has been saved or deleted. Where the data comes from is irrelevant to the process.

The data access controller must also know if the smart client application is running in connected or disconnected mode. With this intelligence, it will know which data store to use when retrieving and updating data.

The second component is the data synchronization controller, which is responsible for communication between the client-side disconnected data store and the server-side connected data store. This controller:

  • ensures that the required data is available in the disconnected data store.
  • ensures that lookup data is synchronized.
  • locks records if required.
  • synchronizes data from the client data store to the server data store when the connection is restored.

Loading data

You must first decide how to get the necessary data from the server data store to the client data store so the application can run entirely off a local database. Unless you are locking records so they can’t be edited, you simply copy data from one data store to the other. If you are building a data synchronization controller and are using SQL Server, you could also create a DTS package to accomplish the job.

If the data is too large to copy over and won’t fit on the local hard drive, determine if you really need all the data to run the application in disconnected mode. Here is an example of how it is possible to deal with very large data sets by building intelligence into your data loading:

Let’s say we are writing a sales application for a multinational pharmaceutical company that makes a wide range of products, from over-the-counter pain relievers to surgical kits to medical imaging products. The company’s client base includes retailers, doctor’s offices, and medical institutions such as hospitals, surgical centers and hospices.

A single pain reliever from the company has more than three-dozen SKUs, or stockkeeping units, which are alphanumeric identifiers of the product that enable it to be tracked for inventory purposes. Some SKUs are used for doctor’s offices, some for medical institutions, and some for retailers. The company’s sales representatives specialize in selling to only one of these vertical markets, so the data synchronization controller can be configured to copy over only the product data that the reps will be selling, and in the region or territory in which they sell. Sales reps only need client data and potential leads in their region.

When copying over a subset of data to the client, you must ensure that all related data is copied over as well. This may include a cascade of data from tables that are four or five times removed from the primary table in your schema. In our pharmaceutical sales example, you may be able to view sales history for a client, but that history might contain SKUs that have been become historical in the system. These need to be copied over to the client to view historical data offline.

You should also consider limiting functionality when an application is disconnected. Perhaps you only allow a view of a client’s order history going back three years. Maybe there are reports that can only be run when the application is connected. While some of these decisions may be based on business requirements, they may be influenced by technical and practical aspects of their implementation. If you are limiting functionality when the application is disconnected, you need a clear way to communicate this to the user.

Client-side data caching

You can build additional intelligence into the data access controller and alleviate some of the burden on the server-side data store by getting look-up data such as a list of states or phone number types from the client-side data store. The data synchronization controller makes sure that the look-up data is in sync with the server, but the application will receive its data from the client data store even when the application is fully connected.

Look-up or static data could also be cached in memory by the data access controller. You could cache the data on the first request or have a background thread pull the data sets into memory at the start of the application.

Let’s now say that the application is disconnected. The user starts creating new data and modifying or even deleting existing data. When the application connects back to the network, how do you get the new and modified data from the client data store to the server? How do you know what has changed? How do you deal with identity field assignments on the client that may not match those on the server?

First, be sure to synchronize only the data that has changed. A few approaches you can take include database delta, change logging, or data flagging.

In all three approaches you must consider identity fields because the ID assigned to a record on the client will probably not match the one assigned on the server. In fact, in a busy system, the ID will likely have been used before you synchronize. Another consideration is the order in which the tables are synced. If a hierarchy or table exists, you must respect that structure and start at the appropriate place.

Synchronization concerns

In the database delta approach, the system must look at each database table that can be synchronized and evaluate if a record has been changed, inserted or deleted. Changed records are easiest to identify and update. New records need to take into account identity fields, particularly if the ID field is referenced in other places. It then becomes the responsibility of the data synchronization controller to ensure that the new ID, assigned by the server, cascades to the referencing tables before the data is synchronized.

Deleted records are more problematic. You can’t assume that if a record exists on the server and not on the client a delete should be performed. If someone added a record while you were disconnected, it would be wrongfully deleted.

One way to implement database delta synchronization is to use a time-stamped field on each record and note the time of the last synchronization. Then any record whose time stamp is greater than the last synchronization is either updated or inserted. Any record that exists on the server with a time stamp that is less than the last synchronization time and does not exist on the client can be deleted. While this approach is workable, it isn’t very elegant and can be dangerous if not implemented carefully.

Change logging is a safer method. Think of it as implementing your own SQL Server transaction log. Every action you take against the local database is logged as a synchronization record. When you are connected to the server again, the synchronization records are played against the server and the data is synched.

Data flagging is another option. It is safer than the database delta approach, and while less elegant than change logging, it is simpler to implement. With data flagging, each table that can be updated is given an extra column-a flag that tells the status of the record, whether it is unchanged, updated, inserted or deleted. In addition, any table with an identity field has a GUID field added to it.

When the application inserts, updates or deletes a row, the status of the record changes appropriately. When a record is inserted into a table with an identity field, however, a GUID is used instead. The GUID relates to other tables rather than to the identity key until synchronization occurs. Then, during synchronization, the identity field is assigned on the server and can be matched to related records via the GUID. Synchronization is simplified because each record that needs to participate is clearly marked and indicates the operation that needs to be performed.

Locking records and concurrency issues

Data locking is a different approach to data loading and synchronization. You create a flag alerting the server that a record or set of records is locked by a user; all other users have read-only access to the information. This approach is restrictive, but it enables you to identify the data sets that need to be loaded onto the client and eliminates concurrency issues.

When synchronizing data back to the server, you must have a set of rules by which the data is transferred. Problems can occur when the record you are updating or deleting has already been altered. Do you overwrite the data? Discard new changes? Let the client decide?

The “last-in” method is the easiest to implement but also the best way to lose valuable changes. In this method, the last person to synchronize or save his or her data controls the values in the record. Your updates overwrite previous updates and your delete removes records-even those that have been modified.

“First-in” disallows changes to records that have been modified since the start of the disconnected work. The idea is to flag records that need to be reviewed and enable the user to view the new version of the record and re-apply his or her changes. You may also want to build in more intelligence and enable the user to do an on-the-spot, side-by-side comparison of the two records and select needed values from each record before updating.

Conclusion

Making an application work in both a connected and disconnected environment is no small task when you consider the complexities of data loading and return synchronization. You must take the time to design, implement and test a system if you are going to be successful.