Best Practice and Judgement

Until the recent promise of generally-available broadband, the industry had to wrestle with the task of providing a solution to the problem of company staff with portable computers who have to view and update information in databases to which they are only occasionally connected. In the past, panaceas have been described in the technical press for two-way synchronisation; in reality, there has been complexity, manual resolution of data-conflicts, and compromise.

With SQL Server 2008 and recent versions of ADO.NET, two-way synchronization is a more realistic proposition. At the SQL Server end, we have Change Tracking, which allows us to track what rows changed, how, and when. We can then merge/synchronize only those rows with the occasionally-connected client. At the application end, we have the Microsoft Synchronization Framework (previously called Sync services), which offers a set of data providers for synchronizing all SQL Server database editions (including Compact). It supplies a complete set of components to synchronize data, based on the use of synchronization metadata. Changes on the client are batched up and sent to the central store; a “change applier object” in the provider detects any possible conflicts, and the developer can define specific conflict resolution policies.

To make a success of tools like this, the DBA has also to adopt some of the mindset of the designer of distributed databases. It is unusual, for example, to see Microsoft arguing against the use of IDENTITY columns as your primary keys, but for distributed databases, it is a bad practice. The problem arises when a disconnected user adds a new record to their local database, and a new IDENTITY value is duly generated. When reconnecting to synchronize, and push the changes to the central database, there is a high probability of a conflict due to the IDENTITY value having already been assigned to a row in the central database. The use of NEWID() (or NEWSEQUENTIALID()) to generate a random, 16-byte unique value for each row solves this problem, only to give you even worse ones with index fragmentation.

The “traditional” school of thought suggests that every table should have a natural key that can uniquely identify each row in your table and that means something to the user. It turns out that such a scheme is well-suited to a system that involves use of partially connected databases. Once again we must be careful to qualify ‘best practice’. For the DBA, rules can be a guide, but should not replace judgement and experience.

Cheers,
Tony.