Association for Data and Cyber Governance Article
With the California Consumer Privacy Act (CCPA) enforcement date right around the corner, lawsuits citing CCPA have already begun to emerge, and data deletion requests are piling up. Financial institutions–and other organizations–must now act immediately to comply with data privacy laws.
Of the many methods companies can use to safeguard the data in their possession, data anonymization has emerged as a trusted favorite. The Association for Data Governance (ADCG) recently interviewed our Prasun Howli to provide an instructional primer on data anonymization.
“To understand anonymization, you must first understand the concept of personal data,” says Howli.
Under regulations such as CCPA and the General Data Protection Regulation (GDPR), personal data is defined as any data that can be linked to a data subject’s identity. “From a regulatory aspect, you have a responsibility to ensure that you are following each applicable regulation in terms of collecting, processing, retaining and destroying personal data,” says Howli.
However, most of these regulations apply only to personal data. Once data can no longer be connected with its owner, it is no longer subject to the same regulatory scrutiny.
Data anonymization is the process of converting personal data into anonymized data, so that the data subjects can no longer be identified. This allows companies to keep using the data for their purposes while removing the risk of facing legal consequences if that data is subject to a breach.
Before you get into anonymization techniques, you need to think about your objective. What are you hoping to get from the data?
“Once you have the objective, you know what you need from the data,” says Howli. This allows you to identify which data attributes are useful for your purposes, and which can be discarded in an attempt to make the data less personal.
Howli offers up an example. Let’s say your objective is to target the marketing of your product to customers in a certain age group. When you retain the data, you can suppress irrelevant attributes, such as social security number or gender, as they are useless for your purposes. In turn, this will decrease the risk of failing to comply with regulations.
Once you know your objective, it allows you to target a certain level of risk. This is known as your risk threshold: the amount of risk you are willing to tolerate to keep using the data for your purposes.
There is no universal threshold that each company should follow. When assessing risk, you need to consider the various scenarios in which the data could be accessed and misused.
The individuals who have the highest chance of re-identifying anonymized data are those within your company. However, if a third-party has access to some elements of your data, you must consider this risk when determining your anonymization strategy.
It is also crucial to consider the worst case scenario: a data breach. When anonymizing data, put yourself in a potential attackers’ shoes. If they found their way into your system, is there any feasible way they could trace that data back to its subject? Could they locate and piece together data that has been stored separately? If the answer is yes, strengthen your anonymization technique until you are confident that no customers are at risk.
Proper risk analysis is also an issue of team building. “You can hire a very good software developer to implement a data anonymization process, however, the software developer may not be able to assess the risk related to the data anonymization process,” says Howli. “So you need to involve a subject matter expert to review the data that need to be anonymized and provide appropriate recommendations so that the re-identification risk can be minimized.”
Once you know your objective and risk threshold, it is time to consider the various data anonymization techniques at your disposal.
It is common to start with the aforementioned attribute suppression: where you discard any collected data attributes that are irrelevant for your objective. “This way data can be de-identified, but it can still be useful for your purposes,” says Howli.
You can also try data masking. Think about when you are asked for your social security or credit card number but only give the last four digits. This can be applied to all types of data, where the irrelevant, identifiable information within an attribute is discarded.
According to Howli, the safest anonymization technique is creating synthetic data. This is where you replicate a data profile, keeping the relevant attributes but leaving no link to the original, identifiable data.
Picking the right technique involves considering what information is sufficient to re-identify the data. For example, if you collect the name, address and age-group, you can suppress the subject’s name and mask the address so that the street number is removed. If thousands of people live on this street, this could be sufficient for anonymization. However, if only ten people live on the street, knowing the age group would make it much easier for an attacker to re-identify the data. If that’s the case, it’s time to go back to the drawing board.
“There is a possibility that, even after taking all the precautions, there might be a way for people to re-identify the data,” says Howli. In these cases, the safest option is to delete the data.
Sometimes, if extensive anonymization tactics still leave a path to re-identification, the solution might have to do with adjusting your marketing strategy to collect data in a safer way.
Make sure you know the regulations like the back of your hand. “If there is a private right-to-action assigned to any data breach, you need to be very careful,” stresses Howli. “Always balance the risk with the value that you are getting from the data.”
From a regulatory aspect, you have a responsibility to ensure that you are following each applicable regulation in terms of collecting, processing, retaining and destroying personal data.”
Prasun Howli, Associate Director