Principles of Data Protection

Protecting data in SQL Server is not as simple as setting a few properties. While there are great security features in SQL Server, such as Transparent Data Encryption, production data may end up in places throughout the organization. In this article, Brian Kelley talks about the best ways to secure data using the concept of least privilege.

Data Protection isn’t primarily a technical issue. It’s actually more of a business problem. As a result, this article reviews the scope of the problem and how to go about trying to address it. The reality is that it’s not going to be able to fully cover everything, at least, not from the role of a database professional. However, there are things that can be done to better protect an organization’s data. A lot of it is around collecting information and planning. So where do we start?

If we are serious about protecting our organization’s data, let’s begin with a fundamental principle. It drives what we do. That is the principle of least privilege.

Principle of Least Privilege

The principle of least privilege is the concept of giving only the permissions needed for the job or role, no more and no less. The “no less” isn’t typically an issue. If someone doesn’t have enough permissions, that usually gets addressed quickly. The typical issue is where folks have more permissions than they should. Here are some examples:

  • Everyone, even the mail room staff, has access to a file share supposed for HR use only.

  • An analyst has write access to a database when he or she only needs to be able to read.

  • The system administrators are members of the sysadmin role on every production SQL Server.

We tend to err on the side of giving too many permissions in order to ensure folks can do their jobs. After all, the folks we hire are trustworthy, right? At least, we hope they are. Let’s assume that they are. Let’s assume that no employee would do anything to intentionally harm the company. That’s the best-case scenario.

The problem, though, is that adversaries have determined it’s easier to get inside an organization by compromising valid accounts rather than trying to punch holes in the layers of network devices, firewalls, controls, etc. Phishing attacks, email campaigns with embedded malware, or getting physical access and looking for passwords on sticky notes underneath the keyboard lead to account compromises. While the employee wouldn’t harm the company on purpose, the adversary with the ability to use the employee’s account will. Let’s say it’s someone in the mailroom whose account is compromised. While that employee would never access the HR share, the adversary will.

If everyone didn’t have access and the share was properly restricted to HR, then the mailroom employee’s account couldn’t be used to steal the data. Keep in mind that there’s already risk. After all, it could be someone from HR who is compromised. However, by applying the principle of least privilege, we reduce the overall risk to the organization. Now the potential for compromise is limited.

Understanding the Data We Have

One of the reasons we often fail to adhere to the principle of least privilege is because we don’t fully understand which permissions are needed to do something. For instance, we decide to place a Windows user in a security group. What access does that security group grant – what file shares, what databases? Since we’re not sure, we give out a lot of permissions, maybe we add the user to multiple security groups. We hope that some of those permissions will grant the required access because people need to work!

From a data protection standpoint, this is a bad practice. We know that. Yet we still violate it. How do we improve?

What Accesses Our Data?

We have to be sure we know what accesses our data. There isn’t a technical solution that can automatically give us the answer. We can’t run a PowerShell script and know immediately everything that hits our key financial database. Over time we can collect that information, but the key word is “time.” If I look today, and today is not quarter end, then I don’t see the quarter end processes. If we’re looking at our HR related databases, then we really don’t know everything unless we also take into account the annual enrollment period.

The only way to be able to follow the principle of least privilege correctly is to know who and what access our data. This also includes ad hoc access, like folks running reports through SQL Server Reporting Services (SSRS) or doing analysis through Microsoft Excel. Therefore, in order to improve our data protection, we have to understand what accesses that data.

Obviously, documentation is required. When we have documentation there’s always the problem with keeping that documentation updated. While there are tools available, this task ultimately falls to people. Realistically, this is a battle we will always have to fight. Taking time to update documentation means we take time from other efforts. However, if we want to be serious about data protection, we have to know what accesses that data in order to be able to protect it.

Where Does the Data Go?

Data doesn’t stay at rest. We move it around all the time. For instance, every time someone runs a poll about whether or not folks restore production data to non-production, the numbers reveal that this is still a common practice. Once the data goes to non-production, what kind of security is present? What kind of monitoring? Does our documentation about what accesses the data take into account that production data ends up in non-production? After all, the environment may be non-production, but the data is still real. It’s still production data. And it still puts the organization at risk.

We aren’t just concerned about production data making its way into non-production. We’re also concerned about how data moves around in the “production” environment. For instance, if I’m using SQL Server Integration Services (SSIS) or a competitor product to move data around, where does it go? A classic example is where the data is pulled out of a transactional system via SSIS to a reporting database, which is then queried through an SSRS report, which an end user exports into Excel and saves to a laptop.

C:\Community\2018\Articles\Simple-Talk\Principles of Data Protection\data_flow_to_laptop.png

Proper data protection means we have to have security at each of those points. Again, this is easier said than done. For instance, some folks stop at the laptop with, “We have Bitlocker to encrypt the drive, we have antivirus, and we make sure it patches every time it hits our network.” However, this doesn’t address what a user could do with the data from the laptop. Can the user email that Excel file off using a web mail provider? What about printing to a printer connected via BlueTooth?

The reality is that unless you’re in a tightly controlled environment, there’s a limit to what you can do. However, we should not use the fact that we can’t stop everything as an excuse to do nothing. Given that, let’s look at what we can do with respect to SQL Server.

Protecting the File System

Oftentimes, when we think about data security, we focus on SQL Server itself. However, SQL Server sits upon an OS. That OS has a file system. Even storage appliances run a file system. Data is vulnerable outside of SQL Server. Therefore, we need to be concerned about the files, too. When I think about data around SQL Server, I’m concerned with four types of files.

  • Database files (at rest)

  • Backup files

  • Source files to be imported into SQL Server

  • Extract files exported out of SQL Server

Data Files

In most organizations, SQL Server comes to a halt at least once a month: during security patching. When SQL Server is stopped, the database files are at rest. It is at this point they can be copied off. Many organizations have automated processes for patching because it’s impractical to patch each system manually. That usually means there’s a window of time for systems to patch. And during this time, we expect SQL Servers to potentially go up and down as restarts are required. A potential attack could come from a rogue administrator or an adversary who has compromised administrator credentials. All it takes is stopping SQL Server and the files can be copied off.

This is the point of Transparent Data Encryption (TDE). The files at rest are encrypted. If someone did copy the files off, that doesn’t do them any good without the correct certificate. Unfortunately, there are some things to consider with using TDE. First, only Enterprise Edition has this feature. Second, recovery becomes more complex. We have to ensure we have the right certificate as well as the password to decrypt that certificate on the new SQL Server. Finally, remember that this is only going to protect the database files at rest. Here we are talking about the file system and it’s appropriately a solution. But it will not protect the data if a user legitimately has access to it through SQL Server. That has to be controlled within SQL Server, but more about that later.

Beyond that, SQL Server takes steps to restrict the folder(s) where the files are kept. Realize that anyone who has administrative rights to the server can get to the files, the key here is not to make things any worse. For instance, don’t create a file share to the folder containing the data files.

Backup Files

We can do a fantastic job of protecting our data files and locking down SQL Server. It’s all for naught, though, if we leave the backups vulnerable. For instance, if folks who don’t have access within SQL Server have access to the backups, that’s likely an issue. If the backups are encrypted, then we’ve likely mitigated the risk, unless of course we’ve stored the means of decrypting the databases/backups with the backups. That seems like an obvious thing, but when we’re single-mindedly focused on how to recover, we may just store everything in one place. After all, we’re regularly reminded of recovery time objective (RTO) and if you’re like me, you’ve been asked if you can reduce the RTO any further.

Even if our backups are encrypted, we still want to apply the principle of least privilege to where we’re writing/storing our backups. The SQL Server service or SQL Server Agent service needs access. Who else? The DBAs will likely need access. The system administrators probably have access by default and may need to have access to recover files (the backups) in the likely event that someone asks for a restore from days or weeks in the past. We’re looking to minimize access. If we can restrict even these folks, all the better.

Source and Extract Files

Since we’re talking about data protection, we should also consider any files which we store data in, even if it’s not in a traditional database like SQL Server. For instance, when transferring data between systems, we may use SSIS to read from a source file to get data into SQL Server. Or we could do this in reverse: we create extracts of data from SQL Server. This doesn’t have to be SSIS. For instance, the example we looked at above the extract from the reporting database was delivered via SSIS. It’s entirely possible for someone to use a script, such as with PowerShell. The fact of the matter is that in whatever form our data exists in, we need to protect it.

Again, we’ll look to apply the principle of least privilege. For instance, if there is a file share where we drop mainframe files to be imported into SQL Server via SSIS, we should ask what and who needs access. Certainly, Authenticated Users is not the correct answer. Likely it should be restricted to the processes which write the file, read the file, and those folks who are responsible for troubleshooting any issues.

There are limits to what we can do from this perspective, especially on the extract side. For instance, I don’t expect a database administrator to try and fix an issue with how the laptop is managed. Typically, we can touch one step away. For instance, we can touch permissions in SSRS. Or we can specify permissions on the file share for the source files. The key here is to do what we can.

Protecting the Database

Having addressed the file system, let’s turn to our natural first area of focus: the database. Let’s look at what typically is the best security model to implement. We’ll start at the top, in Active Directory, and then make our way down into the database.

Why Active Directory?

In most organizations, Active Directory is the security system we use to control access. For instance, if we have to forcibly terminate someone, usually the first step is disabling the account in Active Directory. Because Active Directory enables access across a multitude of systems and also because it’s the place where we first go to when we want to disable access, we want to utilize it when we can to enable / disable access into SQL Server. We could manage a bunch of SQL Server logins, but that’s more work for the DBAs that’s probably unnecessary and we incur additional risk for our organizations because we could have a SQL Server login still with permissions when it no longer should. This doesn’t just happen when someone leaves the organization. It can also happen when someone changes roles within the organization. We hope that any changes are reflected properly in Active Directory as soon as possible because of its importance.

Windows Security Groups

Since we’re relying on Active Directory when we can, we want to maximize its usefulness. We also want to avoid issues when possible. Therefore, Windows groups are preferred when granting access and managing permissions inside SQL Server. Why use groups over Windows users?

If we use Windows users and not groups, we run into a problem when a person switches roles within the organization which should result in a change in security. For instance, if someone switches from one development team to another, likely their database permissions need to change. If the DBAs aren’t made aware of the switch, not only does the developer not get the new access needed, but the old access isn’t revoked, either. If the developer’s group membership has been changed at the Active Directory level and we’ve based security on those Windows groups, then the developer has the access change as soon as the AD changes are completed, and the developer logs out and back in (to regenerate the user security token).

Are there cases when you have to use Windows users? Absolutely there are, just like there are edge cases where you’ll have to use a SQL Server login. Keep in mind I’m not speaking about application connectivity, but actual people connecting to SQL Server. A lot of time, if we are dealing with a service account, we’ll assign permissions directly to the Windows user for the service because it’s not going to change roles and there’s no point in creating a group. But when we’re focusing on the user accounts for actual people, we want to use the standard Windows groups. Yes, this means we should be prepared to grant permissions to Windows users, but this situation should truly be the exception.

Within the Database – Database Roles

However we configure access into SQL Server, unless it’s a database for a third party application, we should have complete control over the security within the database. Therefore, let’s make the security model as simple as possible while still meeting the principle of least privilege. We do this by using database roles.

For instance, consider that we’re building a database to track our Scrum team activity within our organization. For the purposes of understanding how to apply the roles, I’m making a couple of simplifying assumptions. First, there’s only one team being modeled. Second, no one has multiple roles. That’s the ideal with Scrum, so that’s what we’ll go with. Here the roles we have within Scrum that we’ll track:

  • Product Owner

  • Scrum Master

  • Team Member

  • Stakeholder

Each of these business roles has different responsibilities and expectations. Naturally, we’d expect that to get modeled into the database. For instance, we may have different stored procedures for Scrum Master versus Team Member functionality. There will be some overlap, but there are some things a Scrum Master does that a Team Member doesn’t and vice versa. Therefore, the simplest model would be to create different database roles matching the business roles.

Once we’ve created the roles, we should assign permissions accordingly. Again, we want to implement permissions based on the principle of least privilege. For instance, a team member shouldn’t have the ability to change the importance of a Product Backlog Item (PBI). That’s defined by the Product Owner. Therefore, we would not give the Team Member role permission to execute any stored procedures for raising or lowering PBI priority.

With the database roles created and permissions defined, the final step is to connect the Windows groups to the database roles. There isn’t always a one-to-one relationship. For instance, when we consider all the Stakeholders, likely there are multiple sets of folks who are impacted. Let’s take an HR application. Obviously, HR is a stakeholder, but in larger organizations you may have HR broken up to Payroll, Benefits, and Hiring/Retention. Each of these three functions may have their own Windows security groups. Then, of course, you’ll have IT. In this example, we may have 4 different security groups which will be placed in the Stakeholders role.

The power of using database roles can be seen in the following consideration. Imagine that at one point, Payroll and Benefits was a combined group with a single security group. However, the organization has gotten larger and there’s a need to split that group into the two separate groups. If we didn’t use Database Roles, at the very least we have to grant access to a new security group, considering that one of the security groups gets renamed and re-used. But what if the security group for all of HR is used? Then the individual groups aren’t necessary. That is, until something happens that means the entire HR group can’t be used. For instance, it’s coming to the end of the academic year. HR brings on two interns. These two should have limited access. They shouldn’t have access as a stakeholder in the Scrum tracking system. If there is no database role, that means you’re either now assigning the permissions against each security group or you’ll be setting up deny permissions specifically for the group containing just the HR interns. Neither is an optimal choice.

What Else?

Depending on the version and edition of SQL Server, there are further options to protect data. There are built-in encryption options to protect data which have been in SQL Server since SQL Server 2005. Now we have data masking to hide data while still retaining the actual data for cases where it’s needed. Always Encrypted puts the point of encryption on the application server, meaning the data arrives at and is stored in SQL Server encrypted.

Of course, there’s the whole realm of auditing. Can we track who accesses what? Are we analyzing that information? Can we detect improper access? SQL Server provides functionality to do this, but the reality is that we want to start with permissions. It’s better to restrict who has access because it takes less work overall than trying to comb through millions of audit records to determine if someone crossed the line. We should still have solid auditing and review. That’s a must. But we should start with the principle of least privilege because the audit system doesn’t prevent access of the data. It just tells us that the data was touched.

Concluding Thoughts

When it comes to data protection, we start with the principle of least privilege. It’s what we build our data protection strategy around. However, in order to protect our data, we must understand two key things: what and who accesses our data and how our data travels throughout our organization. This takes time. This takes effort to collect the data. Once we have a starting point for these areas, we can look at how to secure that data. We’ve looked briefly at the file system and the right way to build a model within SQL Server. Of course, there are other things to consider, other technologies or features. At the end of the day, though, we still must center on the principle of least privilege.