Archiving Hierarchical, Deleted Transactions Using XML

When you delete a business transaction from the database, there are times when you might want to keep a record of the data for posterity. This can become somewhat complicated if the transaction you need to delete is in a table that is the parent of a deeply nested hierarchy of dependent tables based on the foreign key relationships. In this article, Dwain Camps looks at a tidy means of doing just that.

Although I’m not generally a big proponent of storing XML in a SQL database, I feel that the archiving of deleted data that relates to a single business transaction is one of the rare cases where it is reasonable option.

Some Sample Data

Let’s start by using Data Definition Language (DDL) to create a couple of example tables, and use Data Manipulation Language (DMS) to populate those tables. These are borrowed from a previous article (I love reusing code). The data model is an Invoices table with child, detail records stored in an Invoice_Details table.

In this sample, we have a hierarchical structure of two tables with a parent-child relationship between them, implemented using the FOREIGN KEY constraint. This is a simplification; in the real world there could be many tables originating from that single parent. In the particular case where I used this, there were a total of nine tables hanging off that same parent and the hierarchy was four levels deep.

The DROPs are provided so you can clean up your sandbox later.

Deleting the Data with Archiving

The only reason you’d ever need to archive deleted data is on the off-chance that someone deletes a record that shouldn’t have been deleted, and then they want it to be restored. There are alternative, traditional ways to do this:

  • Create “shadow” archiving tables into which you simply move the deleted transactions. This can become rather cumbersome as the number of tables participating in the relationship grows large, cluttering up your data model.
  • Introduce flag column(s), for example the user that did the delete and a date/time stamp, in each table to represent a “soft-delete.” In my experience, this can be rather a nuisance because it requires programming all over the place to ensure that deleted transactions are excluded from query results. You could be a bit more clever and use indexed VIEWs to achieve excluding the deleted rows, but you may still run the risk of later developers not realizing why these VIEWs need to be used.
  • Conceivably you could dump the deleted transactions as some sort of text-based file (CSV or XML) or even into an Excel spreadsheet residing somewhere on your database server, but that sounds like an awful lot of complexity to me.

Let’s now look at how we can use XML to keep our deleted data around for future restoration or inspection.

Use One Table to Store the Entire Parent to Great-great-great-grandchild Relationships

Here is some more DDL to define a table we’ll use to archive deleted transactions.

As you can see there is one XML column for each of the tables participating in our transaction’s hierarchy. There are also a couple of leading columns to capture the:

  • PRIMARY KEY of the deleted transaction (InvoiceNo).
  • User ID and date/time that the transaction was deleted.

To Delete/Archive a Transaction

Within a transaction, we can first copy off the invoice and details to our archiving table, then do a cascade delete of the invoice from our tables. The code below is best put into a Stored Procedure rather than constructed within your application’s front end.

You can now run the following to archive/cascade delete the records.

We did not include the computed column (ExtendedPrice of our Invoice_Details table) into our XML tables because it will not be needed when we restore this archived transaction. You’ll need to sequence the DELETEs so that the tables that are the deepest in the hierarchy are deleted first.

Note that you should determine if there are any DML TRIGGERs that run on DELETE and examine what they do, then decide whether to ENABLE/DISABLE them prior to running the script above. For example, the Invoice_Details might have an audit trail being created when invoice details are inserted, updated or deleted.

At this time, you’ll want to rerun the initial code that sets up the Invoices and Invoices_Details tables (drop them first) to repopulate the tables, then rerun the first SP EXEC statement above.

Using a Trigger

This same thing can be done with an INSTEAD OF TRIGGER, although the only thing you cannot do is record the user that performed the DELETE. Here is that TRIGGER code:

Note how we’ve set the UserID column in the Deleted_Invoices table to blank because we don’t have access to that information in the TRIGGER (although you could substitute it with the SQL Server login account if that is applicable).

Assuming you’ve restored the original data to the two invoices tables, you can run the following code to delete and archive invoice IN20141000002 the same as before.

The XML that our Archiving Process Created

The XML that was created in the prior step is element-based. We could have just as well chosen attribute-based XML. Since element-based XML tends to be a little more verbose it will take up marginally more storage.

Restore a Transaction

When you need to restore a transaction, you can parse the XML back into the original tables. Once again we’ll put that code into a SP.

We can restore our archived transaction with the following:

Note that because the LineItem is an IDENTITY column, we must use the IDENTITY_INSERT setting to allow us to restore the value corresponding to the invoice detail rows that were originally deleted. You’ll find that you must run the INSERTs in exactly the reverse order for the cascade delete in the prior section. We also deleted the archived transaction, because you’d never want to attempt the restore a second time because you’d get PRIMARY KEY violations on the INSERTs as you’d be attempting to re-insert the same invoice number.

You will also need to be careful to DISABLE/ENABLE any TRIGGER on these tables that runs on INSERT.

The Drawback and How to Mitigate that Effect

Let’s think of our table hierarchy that has been archived as an “archiving object.” There may potentially be many of these objects you need to keep track of in your database. There are a number of operations that can affect your code that creates/restores a transaction from one of these objects:

  • Adding a column to any table that is a part of the hierarchy tree.
  • Dropping a column from any table that is a part of the hierarchy tree.
  • Adding a new child table anywhere within the hierarchy tree.
  • Dropping a table that is located somewhere (presumably one of the lowest level children) in the hierarchy tree.

If any of those actions occur (particularly in either of the two cases where something is being added), your code could be “silently” impacted, which is to say you may go on archiving transactions that are now incomplete without your code generating any error messages. That is not the case for all of the examples above (some will generate errors during the archiving process).

To guard against issues being caused by most of these, we can create a DDL trigger that generates a warning message on some specific event types. Assuming of course that you’ve properly defined your table hierarchy using FOREIGN KEY constraints to tie the lot together.

Let’s look at an interesting bit of SQL.

When we run this SQL, we get something like the following where the object IDs will be different in your database.

This shows us that Invoices (in the rightmost “name” column) is affected by anything modified within the @start_object_id (Invoice_Details). The TraverseTableHierarchy Recursive Common Table Expression (rCTE) actually walks through the table hierarchy (upwards) using FOREIGN KEYs until it reaches the topmost table in your data model. It returns only rows matching the @parent_object_id. You may wish to add an OPTION(MAXRECURSION 0) if for some reason your hierarchy could be more than 100 levels in depth. You may need to modify this code if you have tables with self-referencing FOREIGN KEYS.

We can then construct a DDL trigger that looks for Archived Object structures within our database and displays a warning for the various cases.

At the heart of the modifications made to the rCTE is the use of EVENTDATA() to query the SQL Server Event Data Schema and return the name of the object that was modified. The OBJECT_ID() function can then be used to return the ID of the object and use that as the starting point for the upward table hierarchy traversal. We’ve put the top-level object IDs into the @ArchivedObjects table variable to allow for cases where you may have several archiving objects you want to keep track of.

This DATABASE TRIGGER can be located in the Object Explorer window in SQL Server Management Studio (SSMS) by expanding the database > Programmability > Database Triggers.

You can test the DDL trigger with this DDL.

Comments indicate where a warning is generated. Note that dropping a TABLE from the hierarchy does not generate a warning, and will still not do so if you changed the definition of the DDL trigger to this:

Presumably this is because the FOREIGN KEY relationship has been deleted before the trigger is run. But in that case, your delete/archiving code will probably fail anyway.

The second to last case doesn’t generate a warning until a FOREIGN KEY relationship is established between Invoice_Details_Stuff and Invoice_Details (or any other TABLE within the hierarchy).

To DROP this TRIGGER from your sandbox:

Conclusion

In this article we explored an alternative method of archiving deleted business transactions. Which method you use probably depends on you and your preferences. Any method you choose will have some potential drawbacks.

For our method, we have at least provided a path to mitigating the drawback. As long as you remember to take action on the warning that will be generated by DDL changes to your archived table hierarchies, you’ll probably be fine.

We’d love to hear stories about any other methods readers have used to archive deleted transactions.