What is PHI and why is it important in HIPAA?

2018 saw a sharp increase in concerns about data privacy and protection. Partly, it was driven by the enforcement of the GDPR across Europe in May, but it was also because of the number and size of data breaches that continue to occur even now.

Truth is, though, that long before the GDPR came into play, the Privacy Rule of the Health Insurance Portability and Accountability Act (HIPAA) had already been introduced in 2000 to protect what it refers to as Personal Health Information (PHI).

The Privacy Rule still applies and, with fines of $100 to $50,000 per violation (or per record) up to a maximum of $1.5 million per year, it’s worth knowing your way around PHI just in case you ever have to work with it.

What is PHI?

PHI is any information which refers to someone’s past, present or future physical or mental health, and the provision of any healthcare. Importantly, it goes beyond healthcare records and includes health insurance details as well as any information relating to payment for healthcare which could identify the individual concerned.

Under HIPAA there are 18 identifiers that make health information PHI:

  • Names
  • Geographic data
  • Dates, except year
  • Telephone numbers
  • Fax numbers
  • Email addresses
  • Social Security numbers
  • Medical record numbers
  • Account numbers
  • Health plan beneficiary numbers
  • Certificate/license numbers
  • Vehicle identifiers and serial numbers including license plates
  • Web URLs
  • Device identifiers and serial numbers
  • Internet protocol addresses
  • Full face photos and comparable images
  • Biometric identifiers (i.e. retinal scan, fingerprints)
  • Any unique identifying number or code

The determination of which types of data are considered to be PHI data comes down to any information which might identify an individual along with the health-related data content. It’s the connection with the health data that’s the key here. A list of names, for example, would not be designated as PHI. The same list with an additional column showing a health condition or health plan information would be PHI.

PHI also only applies to data about medical patients or members of health plans and doesn’t include information held in education or employment records. So if, for example, a university or employer has records that show which students or employees have an allergy they need to be aware of, this isn’t regarded as PHI.

Who needs to protect PHI?

The Privacy Rule of the HIPAA uses the term ‘covered entity’ to describe those individuals and organizations which need to protect PHI.

This can refer to a healthcare provider who submits HIPAA transactions electronically, a health plan, a long term health insurer, a healthcare clearinghouse which receives claims information from healthcare providers, or a business associate which has access to healthcare-related information.

The phrase is a catch-all because the processing of PHI can often be conducted for perfectly acceptable reasons like academic research. Fortunately, the Centers for Medicare & Medicaid Services have created a useful online tool to help individuals and organizations find out if they are regarded as a covered entity.

How can PHI be protected?

The Privacy Rule of HIPAA requires that PHI data is de-identified in such a way that it cannot be subsequently used or manipulated to identify individuals. There are two methods that it recommends.

The first, the Expert Determination Method, is a judgment call made by a suitably knowledgeable and qualified person who, in his or her professional opinion, believes the operations performed on the data have rendered it into such a state that there is a very small risk that anybody viewing the data could use it to identify an individual.

The second, the Safe Harbor Method, prescribes that the 18 identifiers listed earlier should have data de-dentification operations performed on them in such a way that no residual information remains that can be used to identify individuals, either by looking at individual identifiers or by trying to find relationships between them.

In modern database development, this can represent a real challenge. With practices like DevOps encouraging the release of small changes, frequently, to front-end applications, the database at the back end often needs to be updated as well. To achieve this, most developers prefer to use an accurate copy of the Production database so that proposed changes and updates can be thoroughly tested against a truly representative environment. Yet these are the very databases that contain the identifiers listed by HIPAA.

To resolve the issue, some organizations provision copies of Production databases using a limited dataset of anonymous data. This rarely works, however, because changes are tested against a database that is neither realistic, nor of a size where the impact on performance can be assessed.

A much better solution is to pseudonymize and mask data in order to provide database copies that retain the referential integrity and distribution characteristics of the original, but contain none of the identifiable data that the Privacy Rule requires to be protected.

Data masking protects sensitive data by replacing it with fictitious, but still realistic data. This protects it from insider and outsider threats, as well as enabling compliance with regulations such as HIPAA, and gives developers confidence in the data they’re testing with, to spot potential issues before changes are deployed.

A measure of the increasing adoption of data masking is Gartner’s 2018 Market Guide for Data Masking, which predicts that the percentage of companies using data masking or practices like it will increase from 15% in 2017 to 40% in 2021. Companies and organizations which collect and process PHI data have been accustomed to the need for it for a long time, but increasing privacy concerns across all industry sectors have widened the interest in the advantages it offers.

Redgate was acknowledged as a representative vendor in Gartner’s 2018 Market Guide for Data Masking, and our SQL Provision solution offers a way to provision copies of Production databases securely. Find out how you can comply with regulations like HIPAA, SOX and the GDPR, while still being able to develop, test and fix code faster.