8 min read

Analytic functions in Google BigQuery – Part 1: Basics

Published on
January 16, 2019
Author
Aliz Team
Aliz Team
Company
Subscribe to our newsletter
Subscribe
Analytic functions in Google BigQuery – Part 1: Basics

Google BigQuery, Google’s data warehouse solution, has many functions and capabilities. In Part 1 of a series of posts, we look at the definition of analytic functions, how they differ from aggregate functions, and the various uses they have in everyday scenarios.

Examples

First of all, let’s define our example table, Table 1, on which all queries in this post and their results are based.

Table 1: Orders

Example of Google BigQuery analytic function-Table of orders

Furthermore, all queries in this post are written in the BigQuery Standard SQL dialect.

Aggregate vs Analytic function

Those familiar with traditional SQL databases already know how aggregate functions work, but for the sake of completeness, I’ll include it here.

An aggregate function is a function that computes a single aggregate value over a group of rows. The syntax of this function is simple enough:

 

 

Each column that is not aggregated has to be present in the GROUP BY clause:

 

 

In contrast, an analytic function is a function that computes an aggregate value for each row over a group of rows. The syntax of this function is as follows:

 

 

Because grouping is done through the analytic function’s own condition block, the GROUP BY clause is not necessary.

Clauses

PARTITION BY

PARTITION BY is arguably the most important clause of an analytic function. Similar to the aggregate function’s GROUP BY clause, it limits the number of rows the function computes merely on a subset of all rows. While all non-aggregate columns must be included in a GROUP BY clause, there is no such restriction here.

Example 1

 

 

Example of Google BigQuery analytic function-PARTITION BY

This query computes the difference between the actual price and the store’s average price. Since we’re using an analytic function, we can include the OrderId and Date columns in our results without grouping by them. If we wanted to get the same result using aggregate functions, the query would look like this:

 

 

ORDER BY

Next, we have the ORDER BY clause. Similar to the ORDER BY at the (near) end of a query, its purpose is to order each partition prior to evaluating the analytic function. While functions such as SUM or AVG do not benefit from an ordered partition, there are certain functions where it is a must. One such function is LAG. This function has one argument (a column name), and its return value is the value of the column from the previous row.

Example 2

 

 

Example of Google BigQuery analytic function-ORDER BY

This query returns the IDs of all orders that were processed first on any given date for each store. (We assume the Order with a lesser ID came in before an order with a greater ID.) The same problem can be solved with aggregate functions, but this method (again) joins a subset of the table with the table itself, which is much less performant than the analytic function we used.

 

 

Conclusion

BigQuery’s analytic functions are powerful tools that can reduce difficult and expensive-to-compute queries to more simple solutions. Though we looked at the definition and semantics of analytic functions, we have yet to cover windowing and advanced use cases. Stay tuned for Part 2 of this blog series.

Interested in more about Analytic?

Author
Aliz Team
Company
Subscribe to our newsletter
Subscribe