Big Data
8 min read

BigQuery column-level data masking: How to maintain security and granular control

Published on
March 8, 2023
Author
Fanni Bolyki
Fanni Bolyki
Junior Data Engineer
Subscribe to our newsletter
Subscribe
BigQuery column-level data masking: How to maintain security and granular control

BigQuery supports dynamic data masking at the column level, which enables teams to use granular data without creating data security problems. This is important because a dataset is likely to contain users’ PII. If we do not handle this, analysts querying the fine-grained data can cause security issues. For example, users’ email addresses or social security numbers are confidential, i.e., they must not be made public under any circumstances. On the other hand, analysts can create the best insights from granular data, so it is important for the entire organization to provide it for them. One way to preserve confidentiality is to hide PII values in specific columns. 

BigQuery is a fully managed, cloud-native data warehouse that enables fast SQL queries using the processing power of Google’s infrastructure. It also offers a range of security and data masking features to help organizations protect their sensitive data. Here we show you step-by-step how to implement PII protection using BigQuery’s inbuilt features.

Create the data policy within BigQuery Data Policies

In BigQuery, you can create policy tags to define access to your data. Consider what kinds of data your organization has and order them into a tree structure. Then consider which team needs access to which data class. For example, one group needs access to business-sensitive data, such as revenue and customer history. Another group needs access to PII like phone numbers and addresses. 

When a user queries data, BigQuery checks the policy tags of the selected columns. If a selected column is tagged with a data masking rule and the user has permission to access the masked or the original data, then BigQuery executes the query. Otherwise, the user receives an “Access Denied” error message.

Create the Data Masking Rule with BigQuery

For each policy tag, we can specify the masking rule. The masking rule defines if the values are changed to nulls, hashes, or default masking values. Also, we can specify in the ‘Principal’ field which users, groups, or service accounts are eligible to query the masked data. If a user is not added here, they will not be able to query the tables to which the policy tag is added. 

Grant permission to the principals

In the IAM & Admin menu, we can grant roles to the principals. There are two important roles allowing a query of masked data:

  1. Masked Reader: The principals can query the table, but will see masked data in the tagged columns. A common use case is when analysts are assigned the Masked Reader role. In this case, we recommend hashes as a data masking rule to allow analysts to demonstrate the change in the PII field without accessing the content of the field.
  2. Fine-Grained Reader: The principals can query the table and read the data. Referring back to the previous use case, this role is not supposed to be given to analysts, but to a group that needs the tagged information, i.e. calling the phone number, sending an email, creating an invoice for the given address, etc.

Apply the data policy to the columns in Dataplex

Dataplex, a Google Cloud Platform service, helps users unify distributed data and automate data management and governance to power analytics at scale. In Dataplex, we can search for BigQuery tables, then in the ‘Schema and column tags’, we can add any preset policy tag to any column by clicking the + button.

As a result, principals with the Masked Reader role will see null values in columns that have a policy tag. 

The data table and the taxonomy must be within the same project. You can find further information about dynamic, column-level data masking in BigQuery in the related GCP documentation:

Author
Fanni Bolyki
Junior Data Engineer
Subscribe to our newsletter
Subscribe