Where do data breaches come from?

I recently did some research on the source of data breaches and in this article, I’m going to talk a bit about my current favorite source for breach information, and what I learned.

Verizon publishes the Data Breach Investigations Report annually and the latest report is the 11th edition, so they’ve had some practice. The free reports are extremely well detailed and, shockingly, they’re even entertaining to read.

The reports don’t claim to discover all data breaches. After all, not all data breaches are discovered, and those that are discovered aren’t necessarily reported.

The 2018 report covers 53,000 incidents, defined as: A security event that compromises the integrity, confidentiality or availability of an information asset.

It also covers 2,216 breaches, which are defined as: An incident that results in the confirmed disclosure — not just potential exposure — of data to an unauthorized party.

These numbers (and the screenshots I’m sharing below), do NOT include breaches involving botnets. Instead, the 43,000 successful accesses via stolen credentials associated with botnets are handled in a special insights section of the report.

Are data breaches caused mainly by insiders or outsiders?

A colleague of mine mentioned that he’d recently seen some numbers suggesting that data breaches were mainly perpetuated by insiders to an organization — but he hadn’t been able to track down the source of those figures or substantiating data. With the number of data breaches we see these days, that’s a pretty dark view of employee-employer relationships!

Here’s what the Verizon report shows in terms of who is behind the breaches:

A screenshot from the Verizon Data Breach Investigations Report 2018, showing 73% perpetuated by outsiders, 28% involving internal actors, 2% involving partners, 2% featuring multiple parties, 50% carried out by organized criminal groups, 12% involved actors identified as nation-state or state-affiliated

2018 Data Breach Investigations Report, 11th Edition, Verizon, page 5

These figures are regarding those confirmed data breaches, not all security incidents. While 28% involve internal actors, the bulk of data breaches are coming from people outside the organization, finding their way in by using malware or social attacks, or by exploiting vulnerabilities created due to errors.

Who can a database administrator trust?

For those internal actors involved in data breaches, my first thought was, Well, so WHO WAS IT?

That’s answered a couple pages later. While the exact internal actors weren’t found for all of the reported data breaches, analysis was done for 277 data breaches:

A screenshot from the Verizon Data Breach Investigations Report 2018, showing internal actors: 72 system admin, 62 end user, 62 other, 32 doctor or nurse, 15 developer, 9 manager, 8 executives

2018 Data Breach Investigations Report, 11th Edition, Verizon, page 9

As much as database administrators like to focus on denying permissions to developers for production, developers were much less likely to be involved in data breaches than system admins. And who exactly are system admins? Well, I’m guessing that includes … the DBAs.

Awkward.

This is remarkable given that you don’t need production access to cause a data breach. It’s pretty normal practice in an enterprise to make copies of production data for use by analysts, developers, product managers, marketing professionals, and others.

Redgate’s 2018 State of Database DevOps Report, for example, found that 67% of respondents use production data in development, test, or QA Environments, and that 58% of respondents reported that production data should be masked when in use in these environments:

Screenshot showing two questions and responses: "Do you use production data in your dev, test, or QA environments?" 67% yes, 28% no, 5% not sure. "Would your production data need to be modified or masked before use in dev, test, or QA environments?" 57% yes, 33% no, 10% not sure.

The 2018 State of Database DevOps, Redgate, page 12

There are good reasons that production data is spread around like this: performance is extremely difficult to predict when testing with data that doesn’t have a very similar distribution and similar size to production data, for example.

But after many years of working in IT, I know that most often this data is not modified or masked after being duplicated. These environments tend to be far less secure than production environments, and they are a very rich target for data breaches — even if it’s not the developers themselves intentionally causing the data breach.

That’s worrying, given that the rise of malware and social attacks means that all environments in an enterprise can be the source of a data breach. And perhaps a sign that more attention should be given to introducing measures to prevent such breaches.

Kendra Little is a Microsoft Certified Master, a Microsoft Data Platform MVP, and a Redgate DevOps Evangelist. You can find her online at littlekendra.com.