SQL Server Row Level Security Deep Dive. Part 6 – RLS Attack Mitigations

Simple Talk
Comments 0

Share to social media

This is part of a series on Row Level Security by Ben Johnston. For the rest of the series, click here

As seen in the previous section, there are several ways bad actors can attempt to bypass RLS. Attacks range from removing RLS, getting data from other systems or straight brute-force methods using side-channel attacks. Mechanisms exist for each potential attack that allow you to avoid the attack or monitor for the attack when avoidance isn’t possible. This section covers those mitigations.

RLS can be bypassed or attacked using several broad categories. These include direct attacks, where RLS is modified in a malicious fashion or disabled, indirect attacks where information can be gathered without modifying the underlying RLS, and side-channel attacks that use specially crafted queries to derive data from RLS protected tables. Refer to the previous section of this series, RLS Attacks, for a full explanation of each attack type.

Excessive errors

Side channel attacks, with the current vulnerability, rely on forcing runtime errors. The exact error is 8134, divide by zero. This is an error that can be captured, allowing administrators to monitor for possible attacks. There are several ways to capture these errors and the method you employ will depend on your environment, standards, and preferences. When taking advantage of the divide by zero attack, thousands or even hundreds of thousands of errors are generated, leaving a large attack footprint. This makes it easier to see the attack – it really sticks out in the logs if you are checking for it.

SQL Trace

The error raised by divide-by-zero attacks is 8134. If a trace is created, the preferred method is using sp_trace_create. Using SQL Profiler is a good way to prototype a trace, but there are some disadvantages. Profiler sends all data over the network, increasing the performance cost. It also runs on a client machine, which must be on and connected all the time (or Profiler must be running on the server, which takes additional server resources) and can’t be configured to run automatically. Profiler is just a front-end for trace, so it’s a good idea to learn how to create a simple trace.

The following shows how this can be configured in profiler.

This example uses TSQL to create a trace capturing exceptions. As noted above, profiler also creates a trace, but it is more resource heavy.

Output using a query to pull the results is shown below.

Sample output from the trace when a brute force attack is running

SQL Trace Deprecation

Microsoft documentation warns that trace functionality will be removed in a future version of SQL Server. Take this into consideration when selecting your method for monitoring potential RLS attacks.

Extended events

In Azure databases, as well as all other SQL instances, extended events can be used to capture error 8134. I find extended events harder to work with and slower to query, but once configured they work well. They also only run on the server which helps performance. When they are setup in the GUI, they can be configured to show live results, but this wouldn’t be the method for active attack monitoring. The following shows how to setup an extended event to capture errors .

For the above extended event, this will pull the errors. I didn’t limit the query to error 8134 alone, but it would be reasonable to do so.

Results from the above query. This was generated by testing the brute-force side-channel attack script with 100 rows. As seen below, over 156 thousand errors are produced for this relatively small dataset. If you see numbers like this, you should dig deeper into what is happening in your system and rule out an RLS attack.

Server or Database Audits

SQL Server instances can use server and database audits to log queries against the server and databases on that server. Auditing for Azure SQL can be used to do the same thing against Azure databases and Azure Synapse. This can be useful for forensic analysis after an attack or even to determine that an attack has occurred. It can capture all queries against a database and it can also capture DBCC commands against a database if that vulnerability is a concern.

A benefit of using audits is that they aren’t limited to monitoring for RLS attacks. They can be used for a wide range of monitoring functions. Reports can be created for audit records making it a simple process to search for issues. If you are already using audits this is a good method to check for RLS related attacks.

SQL Advanced Threat Protection

SQL Advanced Threat Protection is an automated, relatively hands-off method to watch for attacks against SQL Server in Azure. The documentation lists SQL injections specifically, and general suspicious activity and vulnerabilities as target threats. It may catch RLS attacks in the future or certain RLS attacks, but in my testing, I didn’t get any warnings via this tool. It looks like a very useful method for capturing potential security issues, but you will want to supplement this with your own audits for now if RLS is part of your environment.

Performance Changes

Brute force attacks are not considerate of performance. This can potentially be used to detect an attack in action. The primary method to check is CPU usage. In lower-level Azure tiers, DTU usage can also be monitored.

Excessive CPU Usage

Brute force attacks cause CPU usage to spike. Depending on your configuration, it may be noticeable or it may blend into the background. In my test Azure SQL database environment, with the minimum DTUs, it was very noticeable. The DTU overview showed 100% utilization during an unregulated side-channel attack.

The CPU metric on the monitoring page also showed 100% utilization for this low tier SQL database.

My local workstation with SQL Server 2022 Developer Edition showed an increase in CPU usage, but not something I would probably notice or be concerned about. If RLS attacks are a concern on your system, you will want to see how your configuration performs when under attack.

This is still not a foolproof method for detecting these attacks. If an attacker is patient, they would just put a WAITFOR command in their attack to limit the impact of their script. It will still throw the 8134 (divide by zero) error, but the errors would be spread out over time. This is very similar to network port scans. An attacker will wait between scanning each port or even change IP addresses during the scan. Firewalls have evolved to detect these attacks. Your methods will need to evolve too and you will need to be aware of this method when protecting your systems.

This limits the CPU impact of the attack almost entirely, but in the low performance tier, it still spikes the CPU to 100% during the initial processing of the primary key. After that processing, the impact is negligible, but the process is limited to 1 character discovered every second (or whatever interval is selected). Given that there were 156,521 errors generated, it would take almost 2 full days to reveal the protected data in the first 100 rows. Making the delay 50 milliseconds provided the data in a more reasonable time and the CPU was still below 20%. The following graph shows the impact of a 1 second delay.

Attack monitoring

The above tactics can be used to capture attacks. To alert administrators, a few methods can be used. Each method should look for excessive numbers of error 8134. Depending on your system and needs, you may also look for excessive CPU usage, specific DBCC commands, disabling RLS, or even data changes to RLS configuration tables. You will need to incorporate the scripts appropriate for your environment and perceived threat surface and create alerts for each of these.

SQL Agent Job

If a SQL Trace is used, SQL Agent is a potential method to monitor for attacks. If the error count exceeds a defined threshold during a time period, administrators can be notified. It can also be used against extended events, provided the server is on premises or a managed instance.

Agent jobs can also run queries looking for changes to your RLS tables or anything else that can be queried. RLS tables are a good use case for temporal tables. If the RLS configuration is a temporal table, a simple query will show any changes during a given time period. As mentioned previously, RLS is not prescriptive, so you will need to create scripts based on your implementation.

Elastic Job

In a pure Azure environment, Elastic Jobs can be used in a similar manner to SQL Agent jobs. This could be used to query extended events or RLS changes. Since SQL Trace is not available in Azure database, you wouldn’t likely use it for that.

Elastic Job Warning

At the time of this writing, Elastic Jobs are still in development. They have been available for quite a while and appear to be part of the ongoing Azure offerings, but there is no guarantee they will be implemented officially.

External Scheduler / Custom Code

Any external scheduler or custom code that can run SQL queries and send email or other notifications to administrators can be used to monitor for attacks. This will be specific to your environment and the exact method doesn’t matter. My general recommendation is to follow your existing patterns, taking into account future needs. Don’t build a custom solution if you have an existing scheduler or if you have SQL Agent available.

Code repo with history

All modern code repositories maintain history and can be used to show when changes happened and who made the changes. This won’t stop attacks by itself, but it will help with a post-attack analysis. It is also a deterrent to direct code attacks via an automated pipeline. If security is misconfigured, a developer / attacker could remove history in the repo. This is another case in the enterprise for minimum viable security.

Code reviews

Code reviews are a standard practice and a good method to find errors in access predicates, both intentional and unintentional. Microsoft documentation mentions taking care with users configured as policy managers. They can change and bypass RLS configurations. The same is true for developers, especially for automated or non-monitored deployments. Carefully review access predicates and security policy changes. Malicious changes are a risk, but a more likely scenario is code misconfigurations. Either case results in the potential for unauthorized access.

Audit for changes to security objects

Existing access predicates and security policies should be regularly audited and validated. This is to ensure they are still enabled and that the code hasn’t changed. The more sensitive the system and data, the more frequently it should be confirmed. It is useful, and I would say required, to know the expected configuration. Audits can be managed in several ways and some options are detailed below.

Tables with RLS applied can be verified using SQL metadata queries. Tables with RLS and tables without RLS applied, predicate definitions, and columns used for RLS can all be checked very easily. The same types of queries can also examine security policies and access predicates.

Depending on your code repository, you can also compare the committed code to your deployed database objects. There should be a very limited set of accounts that can change objects in a production environment, but it can be worth auditing. This would also check for changes in addition to RLS objects making it even more useful.

SQL metadata can also be used to look for changes to RLS related objects. This is easier if you use standard naming conventions and a separate schema. The modify_date will show when RLS objects have changed. It will also show if an object is dropped and recreated with the same name. There are also trace and extended events that can check for changes to objects. The following example query shows all objects in the RLS schema changed within the last day.

Azure Alerts

Azure alerts can be used to check for performance issues and alert administrators. It usually won’t be an indication of an ongoing attack, but that is one possible reason for major changes in CPU or disk performance. During my testing I saw changes to CPU. This would be an area of interest for any administrator, even if an attack isn’t suspected.

RLS Support and Administration Scripts

 RLS is easy to validate using metadata scripts. The following samples show a few ways to check your RLS configuration.

Objects with RLS applied

The following shows all security policies and access predicate details. This is useful for reviewing security at a database level.

Tables without RLS

The following can be used to show all tables in a database without a security predicate assigned. This is useful for finding new tables that were not properly assigned RLS and also for finding tables where RLS has been removed.

Specific column without RLS

Sometimes you only want to look for a specific column without RLS applied. In larger systems, this can save time and also allows team members without a full understanding of RLS to help run pointed audits. It is useful to perform these checks in production environments with many team members or rapidly shifting teams. It is easy for tables to be inserted into production without RLS in these scenarios. This script is a modification of the previous with additional parameters added to limit the columns searched.

Summary and analysis

RLS is a great way to add a layer of protection to a SQL Server database. It is not the most secure option but it is relatively easy to implement, provides a method to greatly simplify reporting and query needs, and may be secure enough, depending on requirements. The ease of implementation makes is very enticing for internal reporting solutions. It can greatly decrease the number and complexity of reports required.

Layer of security

RLS is an additional layer of security that can be added to a solution. It is not a replacement for other authorization methods, but an enhancement that can be very useful in reporting and warehousing scenarios. Configured correctly, it can also work in transactional applications when used with SESSION_CONTEXT as a means of limiting data seen by particular users.

Hire trusted admins

System administrators and developers must be trusted to some degree. It is still possible, and sometimes required, to audit their activities, but the relationship with administrators must begin with trust. If users with elevated privileges are not trusted, they should not be in the system. Period. But that trust must be earned over time and reinforced with good decisions and successful projects. Administrators and developers have access to logon scripts, backups, monitoring, object code, change requests, and many other methods to directly impact databases and secure data. It is very difficult to prevent bad actors in these categories from gaining extra or unauthorized access. That is where audits can help fill the gap for security requirements.

Consider alternatives if the use case fits

There are several alternatives to RLS. These alternatives were the only way to segregate data by user before RLS was added to SQL Server. Some project use cases fit better with these custom solutions than with RLS. Alternatives include separate views or stored procedures for different users, separate reports, or other custom code.

Locking the data down excessively can potentially encourage data exports, decreasing overall security

One of the expectations of a database system is keeping the data secure. As mentioned, RLS is one of the methods that can enhance security on a system. Keep in mind that if users find a system too restrictive or cumbersome to perform their daily tasks, they will often find a way around those obstacles. Security in SQL Server is not different. Extracting data via service accounts, grabbing data in less secure systems, or using unexpected methods to grab data are possible side effects of unwieldy security structures. The only real solution to this is to work with the business every step of the design and ensure they agree with the security requirements. They are partners in the design and should be the decision makers regarding how secure to make the database. If the users (or their leaders) are requesting the security, they will be much less likely to circumvent those measures.

Current attacks that work may change with updates. It is important to keep up-to-date on patches and monitor systems.

The attack paths shown in this series work with the current versions and patches of SQL Server and SQL Database. There are likely additional attacks that I haven’t considered. Administrators and developers should watch for unusual activity on all databases, not only those secured with RLS. Especially sensitive data should be scrutinized closely and auditing features of SQL should be implemented. All software systems, databases included, are dynamic to some degree and changing. Regular security audits and system reviews can help ensure everything is still functioning as initially designed.

It is also possible that the current vulnerabilities will be addressed in a future version or patch. This would render the demonstrated attack method obsolete, but different attack vectors are possible.

Direct access to data always has the potential to inadvertently expose sensitive data

The security examples show that direct access to data presents more challenges to securing data. In addition to side-channel attacks, misconfigurations can potentially allow access to sensitive data. If access predicates are not applied to new tables in a system, if they are applied incorrectly, or if users are added incorrectly with elevated privileges, data security can be subverted. Be sure everyone on the team knows the proper methods to grant access to the database and the proper way to add new tables to the system.

Training and documentation

RLS can be confusing at first, even for administrators and developers very familiar with SQL Server. If you aren’t expecting RLS in the design, it can cause extra troubleshooting steps when diagnosing potential issues. Having zero records return when the table shows as full via administrative queries is a good indicator, but having RLS on a table shouldn’t be a surprise for the team supporting the database.

Developers, administrators and support teams should be given guidelines on the usage of RLS and the design patterns used in the local implementation. Even end users should have a basic understanding of the requirements if they are allowed to hit the database directly. After the architecture has been designed, the team should be trained and relevant diagrams and documents created.

Be sure to include administrators and service accounts in RLS, tables or session context, if they will be running queries. If service accounts are rotated, be sure to include all relevant accounts. If all relevant service accounts aren’t included, the best-case scenario is errors or missing data during connections. The worst-case scenario is extra deletes (during an anti-join) or duplicates when key columns don’t match. Consider allowing admins access via IS_MEMBER() so any organizational changes don’t impact your design.

It is useful to create a decision chart for support teams to understand what to check when diagnosing connectivity issues. The exact flow will differ by environment, but should start with the account they are using, making sure it is not locked out, that it is in the proper AD groups, the group or user has server access, it has database access, it is in the proper database roles, that RLS is applied correctly, and anything else for your environment. Decision charts like this are also useful for onboarding new developers, presenting to other teams, and during architectural reviews.

During the initial stages of development, templates should be created and shared with developers. This should include tables with the RLS column embedded and tables that require a join to apply RLS. The second category is especially important due to the performance impact it can have. It is also useful to have a set of queries used to manage RLS and check for tables without RLS applied. Development should also include performance tuning to ensure RLS performs as expected and changing the design doesn’t require too much refactoring.

Summary

RLS can be a useful addition to a database solution if it is implemented correctly and the limitations are understood. I didn’t realize the impact of side channel attacks until I dove in and tried to see how much data could be exposed. If users are allowed direct access to the data, errors should be monitored and especially-sensitive data and multi-tenant systems should limit the users allowed to directly query the data. RLS can greatly reduce the complexity of reporting and the number of queries required, just understand the security impact and include all the regular security layers you would in any other database solution.

Regular audits of the system, active monitoring and code reviews should be part of the process. Performance tuning and table design also needs to be considered carefully when adding RLS. Document how the system works and what is needed for RLS to work for an individual user.

If there is no room for error, consider encrypting data. For very sensitive data, double check that end users should be allowed to dynamically query the data. A stored procedure layer over tables with RLS will prevent the side-channel attacks shown in this series but allow most of the benefits. This limits how the data can be used, but it gives an extra layer of protection.

Be sure to discuss the benefits and potential pitfalls of implementing RLS with the business if you are considering it for your solution. As with most solutions, RLS involves compromises. It is usually up to the business to decide what risks they can tolerate. The advantages of RLS are likely to outweigh the downside in many cases, but the discussion is needed given that there are some vulnerabilities.

As seen in the previous section, there are several ways bad actors can attempt to bypass RLS. Attacks range from removing RLS, getting data from other systems or straight brute methods using side channel attacks. Each method has mechanisms to avoid the attack or monitor for the attack when avoidance isn’t possible.

RLS section Links:

Part 1 – Introduction and Use Cases

Part 2 – Setup and Examples

Part 3 – Performance and Troubleshooting

Part 4 – Integration, Antipatterns, and Alternatives to RLS

Part 5 – RLS Attacks

Part 6 – Mitigations to RLS Attacks