Big Data
March 8, 2023

BigQuery column-level data masking: How to maintain security and granular control

BigQuery column-level data masking: How to maintain security and granular control

Interview multiple candidates

Lorem ipsum dolor sit amet, consectetur adipiscing elit proin mi pellentesque  lorem turpis feugiat non sed sed sed aliquam lectus sodales gravida turpis maassa odio faucibus accumsan turpis nulla tellus purus ut   cursus lorem  in pellentesque risus turpis eget quam eu nunc sed diam.

Search for the right experience

Lorem ipsum dolor sit amet, consectetur adipiscing elit proin mi pellentesque  lorem turpis feugiat non sed sed sed aliquam lectus sodales gravida turpis maassa odio.

  1. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
  2. Porttitor nibh est vulputate vitae sem vitae.
  3. Netus vestibulum dignissim scelerisque vitae.
  4. Amet tellus nisl risus lorem vulputate velit eget.

Ask for past work examples & results

Lorem ipsum dolor sit amet, consectetur adipiscing elit consectetur in proin mattis enim posuere maecenas non magna mauris, feugiat montes, porttitor eget nulla id id.

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit.
  • Netus vestibulum dignissim scelerisque vitae.
  • Porttitor nibh est vulputate vitae sem vitae.
  • Amet tellus nisl risus lorem vulputate velit eget.
Vet candidates & ask for past references before hiring

Lorem ipsum dolor sit amet, consectetur adipiscing elit ut suspendisse convallis enim tincidunt nunc condimentum facilisi accumsan tempor donec dolor malesuada vestibulum in sed sed morbi accumsan tristique turpis vivamus non velit euismod.

“Lorem ipsum dolor sit amet, consectetur adipiscing elit nunc gravida purus urna, ipsum eu morbi in enim”
Once you hire them, give them access for all tools & resources for success

Lorem ipsum dolor sit amet, consectetur adipiscing elit ut suspendisse convallis enim tincidunt nunc condimentum facilisi accumsan tempor donec dolor malesuada vestibulum in sed sed morbi accumsan tristique turpis vivamus non velit euismod.

BigQuery supports dynamic data masking at the column level, which enables teams to use granular data without creating data security problems. This is important because a dataset is likely to contain users’ PII. If we do not handle this, analysts querying the fine-grained data can cause security issues. For example, users’ email addresses or social security numbers are confidential, i.e., they must not be made public under any circumstances. On the other hand, analysts can create the best insights from granular data, so it is important for the entire organization to provide it for them. One way to preserve confidentiality is to hide PII values in specific columns. 

BigQuery is a fully managed, cloud-native data warehouse that enables fast SQL queries using the processing power of Google’s infrastructure. It also offers a range of security and data masking features to help organizations protect their sensitive data. Here we show you step-by-step how to implement PII protection using BigQuery’s inbuilt features.

Create the data policy within BigQuery Data Policies

In BigQuery, you can create policy tags to define access to your data. Consider what kinds of data your organization has and order them into a tree structure. Then consider which team needs access to which data class. For example, one group needs access to business-sensitive data, such as revenue and customer history. Another group needs access to PII like phone numbers and addresses. 

When a user queries data, BigQuery checks the policy tags of the selected columns. If a selected column is tagged with a data masking rule and the user has permission to access the masked or the original data, then BigQuery executes the query. Otherwise, the user receives an “Access Denied” error message.

Create the Data Masking Rule with BigQuery

For each policy tag, we can specify the masking rule. The masking rule defines if the values are changed to nulls, hashes, or default masking values. Also, we can specify in the ‘Principal’ field which users, groups, or service accounts are eligible to query the masked data. If a user is not added here, they will not be able to query the tables to which the policy tag is added. 

Grant permission to the principals

In the IAM & Admin menu, we can grant roles to the principals. There are two important roles allowing a query of masked data:

  1. Masked Reader: The principals can query the table, but will see masked data in the tagged columns. A common use case is when analysts are assigned the Masked Reader role. In this case, we recommend hashes as a data masking rule to allow analysts to demonstrate the change in the PII field without accessing the content of the field.
  2. Fine-Grained Reader: The principals can query the table and read the data. Referring back to the previous use case, this role is not supposed to be given to analysts, but to a group that needs the tagged information, i.e. calling the phone number, sending an email, creating an invoice for the given address, etc.

Apply the data policy to the columns in Dataplex

Dataplex, a Google Cloud Platform service, helps users unify distributed data and automate data management and governance to power analytics at scale. In Dataplex, we can search for BigQuery tables, then in the ‘Schema and column tags’, we can add any preset policy tag to any column by clicking the + button.

As a result, principals with the Masked Reader role will see null values in columns that have a policy tag. 

The data table and the taxonomy must be within the same project. You can find further information about dynamic, column-level data masking in BigQuery in the related GCP documentation: