Removing sensitive data from a database
Not too long ago I was assisting a client with an issue they were experiencing while running some compliance software to check for sensitive data. Sensitive data was picked up in the SQL Server data file but when the DBAs checked, the data was nowhere to be found in the database.
It’s fairly widely known that just because you delete data from a database, it doesn’t mean it’s gone from the data file itself, but the question was, how should they go about removing it? They had already tried shrinking the file and executing sp_clean_db_free_space
but it had made no difference.
Deleting rows vs dropping columns
The procedure sp_clean_db_free_space
removes ghost records; rows that have been marked as deleted but not physically removed. This process runs periodically anyway but executing this stored procedure will manually set off the clean-up. You can check how many ghost records you have by running the query below. If it’s high, be careful as there was a bug in earlier versions of SQL Server that meant the automatic clean-up was not working.
1 |
SELECT * from sys.dm_db_index_physical_stats(DB_ID(),NULL,NULL,NULL,’DETAILED’) |
In this case, however, the ghost data was low (less than 100) and the amount of sensitive data was large compared to it. It transpired that the data was in a column that was dropped, and data removed by dropping a column behaves very differently than when a row is deleted. Because the data in a table is stored in rows, marking a whole row for removal is much easier than marking a column in every row. When you drop columns it can be quick, almost instant, because it is only marked for removal at the meta data level. As it’s done in this way, the data isn’t marked as a ghost record.
The options
If you have the Enterprise Edition of SQL Server, you can enable (and then disable if you wish) TDE, which is a good option to wipe the data out. Enabling TDE causes all the pages to be read and encrypted, which won’t account for data outside of this file but will do a pretty good job as far as the database is concerned.
If you don’t have Enterprise Edition, there are other options. The first would be to select all the data into a new table and then drop the old one. The new table could then be renamed back to the original one. This will again leave ghost records but these will be cleaned out.
A slightly less inconvenient option would be to rebuild the clustered index before running sp_clean_db_free_space
. This appears to remove the data in most situations and also applies to rows that have been deleted that appeared in a non-clustered index. During the rebuild, all the pages are read/written and there is plenty of data movement, so it makes sense this would work. In the case of this particular client, the column had been dropped from a heap table so a clustered index had to be added and then dropped, but with the same outcome. Unfortunately, if you don’t know the original table this would mean rebuilding all of them before running the clean (unless you’re a whiz at reading MDFs).
While this method did work in this particular case, it’s not guaranteed and there is always a remote possibility a page is not moved or rewritten and the data stays in place. It does however highlight the way the internals are handled when dropping data. This also does not cover the rest of the disk the database sits on; if you were to drop the database or remove a file, there may be residual data here too and the only way to do it would be to scrub the device securely.
Example
Below is the script if you’d like to try these out for yourself. To look for the data in the mdf you need to take the database offline then copy the file somewhere to open it. This example was run in SQL Server 2016 but it should work in earlier versions too.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
/***************************************************************** * Rollback *****************************************************************/ use master GO ALTER DATABASE [CleanDataTest] SET ONLINE GO ALTER DATABASE [CleanDataTest] SET SINGLE_USER WITH ROLLBACK IMMEDIATE GO DROP DATABASE [CleanDataTest] GO /***************************************************************** * Initial Setup *****************************************************************/ use master GO CREATE DATABASE [CleanDataTest] GO USE [CleanDataTest] GO CREATE TABLE [CleanDataTestTableRow] (ID INT IDENTITY, SensitiveData VARCHAR(10)) GO CREATE TABLE [CleanDataTestTableColumn] (ID INT IDENTITY, SensitiveData VARCHAR(10)) GO INSERT INTO [CleanDataTestTableRow] SELECT 'ABC123' GO 10 INSERT INTO [CleanDataTestTableColumn] SELECT 'XYZ456' GO 10 DELETE FROM [CleanDataTestTableRow] GO ALTER TABLE [CleanDataTestTableColumn] DROP COLUMN SensitiveData GO --If you Open the mdf file in notepad (you may have to copy / paste it first) you should find both ABC123 and XYZ456 --ALTER DATABASE [CleanDataTest] SET OFFLINE WITH ROLLBACK IMMEDIATE --GO /***************************************************************** * Remove rows *****************************************************************/ sp_clean_db_free_space 'CleanDataTest' GO --If you Open the mdf file in notepad (you may have to copy / paste it first) you should find just XYZ456 --ALTER DATABASE [CleanDataTest] SET OFFLINE WITH ROLLBACK IMMEDIATE --GO /***************************************************************** * Remove columns using index rebuild or add/ drop on a heap *****************************************************************/ ALTER TABLE [CleanDataTestTableColumn] ADD CONSTRAINT PK_CleanDataTestTableColumn_ID PRIMARY KEY(ID) GO ALTER TABLE [CleanDataTestTableColumn] DROP CONSTRAINT PK_CleanDataTestTableColumn_ID GO sp_clean_db_free_space 'CleanDataTest' GO --If you Open the mdf file in notepad (you may have to copy / paste it first) you should not find the data --ALTER DATABASE [CleanDataTest] SET OFFLINE WITH ROLLBACK IMMEDIATE --GO /***************************************************************** * Remove both using TDE *****************************************************************/ USE master GO --CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'abcxyz123456' IF NOT EXISTS(select * from sys.certificates where name = 'TestCert') CREATE CERTIFICATE TestCert WITH SUBJECT = 'Test Certificate' GO USE [CleanDataTest] GO CREATE DATABASE ENCRYPTION KEY WITH ALGORITHM = AES_128 ENCRYPTION BY SERVER CERTIFICATE TestCert; GO ALTER DATABASE [CleanDataTest] SET ENCRYPTION ON GO --If you Open the mdf file in notepad (you may have to copy / paste it first) you should not find the data --ALTER DATABASE [CleanDataTest] SET OFFLINE WITH ROLLBACK IMMEDIATE --GO |
Coeo is a Microsoft Gold Partner for Data Platform, Data Analytics and Cloud Platform. We work collaboratively to improve project outcomes and develop deeper knowledge and appreciation of Microsoft technologies, improving our customers’ experience. Find out more about us. |