This is the fifth and final part of this series on SQL Server Dynamic Data Masking. The first part in the series was a brief introduction to dynamic data masking, completing solutions, and use cases. The second part covered setting up masking and some examples. The third and fourth sections explored side channel attacks against dynamic data masking.
This final part covers mitigations to side channel attacks, additional architectural considerations and an analysis of the overall solution.
Side Channel Attack Mitigations
Now that the reality of side-channel attacks is clear, mitigations for attacks need to be explored. The main foundation of mitigations is that data can’t be attacked if it is not accessible. As shown in previous examples, the attack vector for masked data starts with authorization. The functional nature of masked data, including joins, functions and clauses, requires vulnerability to these side channel attacks. All of the strengths and uses of masked data rely on the data being accessible at some level. If a user is not authorized for the data, they don’t have a foothold to attack the data. A smaller footprint also helps mitigate damage from compromised accounts, as the bad actor will have access to less data and hopefully fewer systems. This is all a pillar of the principle of least privilege.
Rely on your regular user audits and authorization mechanisms to provide access to the data. Only trusted users should be allowed access to data. Even if that data is masked. Most users won’t be savvy enough to unmask the data, but if they get a foothold into the database they will be able to peek under the covers. Data security happens in many layers. It starts with trust. If you don’t trust your users, they shouldn’t have direct access to the data. If they don’t have direct access to the data it is not possible for the users to unmask the data.
The primary defense against side channel attacks is limiting users that have access to your data to those that are trusted. This is clear and straight forward, part of standard security practices. Business users and even business owners don’t always understand the implications of bad security practices, they just want the project to work. You have to be a technical ambassador for the project and for security.
It is very easy to apply the
db_datareader built in role to all users but it may not be the best strategy. Security design should be at the forefront of database design, from schema creation, group access, and advanced features like dynamic data masking and RLS. Carefully consider how objects will be accessed during the design phase of the database and project. If you use staging tables or there are pre-processed report tables, remember security on these objects. Pre-processing and staging data is often best done in a separate database. Regular users shouldn’t have access to this database, simplifying the security model.
Another important reminder is that data isn’t masked for from system administrators (
sysadmin role). You can audit those users and you can lock them down in other ways, such as Row-Level Security (RLS), but the nature of the super-powered roles means there is always a workaround. The workaround can be as simple as assigning security or granting access to additional objects in RLS by performing an insert statement. There is no substitute to trusting your administrators. If an administrator is a bad actor, it is much more difficult to limit damage.
No matter how data is locked down, administrators can always restore a copy to an unmanaged instance and make security changes in that instance that includes no logging. There is also the option of using
DBCC PAGE to directly examine data in data files and log files, which could be done directly against a production database.
Locking Down Users
The mitigations discussed assume that the rest of your network is hardened, and users already have to pass through multiple layers to reach your databases. Establishing a good working relationship with the networking and firewall teams is critical to ensuring your database is safe. As previously mentioned, you must also work with the business to determine how the data should be protected. Security is a compromise.
The most secure data is locked in a room and is not accessible by anyone over the network, but that isn’t very usable. Security depends on layers. Procedural, physical, network, active directory, server, database, row level, schema level, etc. One layer of security, including dynamic data masking isn’t sufficient. Each layer works best when it is as transparent as possible and non-intrusive to the end users. When security breaks the flow of work it is less likely to be used. Dynamic data masking is non-intrusive and easy to implement so it meets that criteria. Each layer of security in your solution should meet that same criterion.
Fundamental design decisions impact the data that is potentially exposed via side channel attacks. In additional to user selection and security design, the table architecture and columns configured is a foundational part of limiting exposure. As seen in the proof-of-concept scripts above, it is possible for data to be unmasked.
If exposing sensitive financial data, or any numeric data, it would be more secure rolled up into categories rather than giving access to individual rows. This is a good practice for multiple reasons, in addition to security considerations. Performance, data storage, repeatability are all improved by pre-aggregating the data. If drilldowns to the transaction level data is necessary, consider leaving out the masked column data for those details.
Masking isn’t a good solution for transactional databases and should be used with caution on non-aggregated data. Masking should be used on data that doesn’t expose proprietary business attributes or personally identifiable information. SOX compliant, HIPAA compliant, GDPR compliant, and other systems that must comply with a regulatory body, should use masking carefully and only as part of an architectural design that ensures the security of the data. These systems are typically audited, but that should be verified and implemented if they aren’t currently audited. Also be sure to work with the network and security teams to be sure they are aware of the sensitive nature of the data so they can also monitor for suspicious traffic.
Another layer of security to be considered is encryption. Encryption also is available in layers, but doesn’t have to be implemented at each layer. TLS / SSL (on the wire encryption), at rest encryption (detached database files), backup file encryption, column level encryption (many options for this including database key level encryption), client implemented encryption, etc.
Given the nature of most types of encryption in SQL Server, it doesn’t protect against unauthorized unmasking of data. If data is encrypted at a per-client or per-user level, it would protect between clients, but not data within their scope. If data is directly accessible to end users in an unencrypted state, no matter how it is masked, it is potentially vulnerable. It is a matter of priority for a bad actor.
Data that is only unmasked in the client application can still be captured via screenshots. If that is somehow disabled by the administration team, it could still be captured by taking a picture of the screen. There is always a work-around. The principle of least privilege should always be used. And it is always harder to remove access than to grant access. This is true for service accounts as well as user accounts. Removing access after a project / database has been deployed requires regression testing and a thorough understanding of the application. In larger or complex projects, this isn’t a simple matter.
Monitoring and Audits
Data access should be monitored and regularly audited. Most organizations of size have hundreds if not thousands of databases. It is not possible to constantly monitor these databases manually, custom scripts and audits must be used to make this feasible. Extended events focusing on excessive access to masked columns can be used to find potential unmasking activity. Auditing users with access to unmasked data should also be a common practice. It can be difficult to monitor security configurations on large teams, especially when approvers are not extremely technical.
Verifying that security hasn’t changed and is implemented as expected is an important audit activity. DBA and monitoring tasks are often handled by a different team than the team creating the masking solution. As described above, dynamic data masking requires deep understanding of the project and business rules, which is not possible for DBA teams managing hundreds of servers. It is essential for the teams to work together, especially on systems with sensitive data.
Built in protections / mitigations
In this section, I will outline a couple of the built in protections against invalid data use.
Copying the data out to another table has no impact
If masked data is copied to a new table or extracted, the masking is maintained. This alleviates some concerns about masking and supports the basic functionality of masking. This means that the new table or extract data will have the unmasked data represented as unmasked data, and the masked data will be shown with the same mask as seen in SSMS. The copied data loses the functionality of native masking but preserves the security. If data is extracted via a report, BCP, SSMS, or any other query tool, the masking status is maintained.
If data is put into a new table with a
SELECT INTO, masking is maintained. This leaves the primary concern as setting up security correctly and monitoring for any possible breaches. This also questions the basic premise of masking data. If users shouldn’t be allowed to see data in an unmasked stated, why give them access in any form? It is easy, it does not impact existing functionality, and it can be used in any future functionality.
Copying to memory optimized table has no impact. Extractions do not unmask data.
Memory optimized tables and other advanced features have no impact on masking. If data is moved from the masked table to another table or an extract file, the mask is now the actual data. The same is true if the data is moved to a temp table, table variable, or a memory optimized table. If a data column is masked and is then exported, the export will contain your mask. For example, if the default mask was used for a date, all export rows will contain the value 1900-01-01 00:00:00.0000000. This means that mitigations aren’t needed for extracts, inserts, updates or any other standard data movements and manipulations.
System maintained temporal tables (history tables) are another way to potentially access unmasked data. The history table associated with each base table has a copy of each row as it changes in the base table. When a column in the base table is masked, the associated column in the history table is also masked. This table is vulnerable to the same types of side channel attacks but is not vulnerable directly.
Summary and Analysis
It might work best to finalize the discussion of dynamic data masking by describing both what it is and what it is not. Starting with the positives, data masking implements a form of obfuscation. The data is not actually changed rather it is scrambled as it is presented from the data engine. This allows the data engine to perform all of the normal operations on the data with no modifications to queries, including functions and matching in
ON statements. Because the data is not actually modified it also makes it transparent to existing queries. Data masking is a customizable masking solution built into the engine. This allows the data and the type of obfuscation to be determined by the development team. Data masking is also part of a layered approach to security. It is not a replacement for other security mechanisms, but it is a complementary layer to your current security.
There are potential gaps in the security and it can’t be the only method of making data safe, but it helps. It is also safe at a presentation layer. If used in reports it can be extremely secure. The converse side of that is the database engine level. It is a potential risk if users are allowed direct access to data in the engine. Be sure you know the weaknesses and monitor your database carefully if users are allowed to access databases directly.
The positive list for dynamic data masking makes some of the items on the negatives obvious. It is not encryption. Don’t expect the data to be 100% secure if users have direct access. The obfuscation has some weaknesses that could be exploited by a savvy user. Since it is a possible layer of security, it is not a replacement for other security mechanisms. All security should be implemented as in any other project. And even though it is a layer in a security solution, it is not a traditional security mechanism. Users will still be able to query data that is masked and join to it. There are multiple ways to take advantage of these functional parts of the solution. Another reminder is that dynamic masking is not column level security. Users are still able to query the column, they just get obfuscated data.
Dynamic data masking can be a useful part of a business intelligence solution. Understanding the limitations, strengths, and mitigations to dynamic data masking is an important part of implementing a solution. It’s also important to remember that this is a single part of a security strategy. Security works in layers and this is no exception.
This analysis isn’t meant to castigate Microsoft and their implementation of data masking. To fit existing queries and business requirements, it is a good blend of security and compromise. It is also a good reminder that direct query access and data extracts should only be available to trusted users and partners. There is often a work-around for security protocols if direct access is granted to data. Data security is very similar to physical security and hardware. If you can touch the machine, there is usually a way into it. If you have direct access to the data and enough time, there is usually a way to the base data. Be careful, use multiple layers of defense, and monitor your database activity. Dynamic data masking isn’t a magic bullet to security, but it is a potential layer to help secure data. Each layer should be monitored and each layer should be audited regularly.