8 min read

Compare Analytical Queries Performance of AlloyDB with BigQuery

Published on

September 17, 2023

Author

Aliz Team

Company

Subscribe to our newsletter

Subscribe

AlloyDB is a fully managed PostgreSQL-compatible database service, supporting atomicity, consistency, isolation, and durability (ACID)-compliant transactions. Some of its claims are:

4x faster than standard PostgreSQL for transactional workloads.
Up to 100x faster analytical queries than standard PostgreSQL.

This experiment will focus on an analytical query performance comparison between AlloyDB and BigQuery. There is an official Google Cloud blog on benchmarking for AlloyDB, but it compares to standard PostgreSQL. The benchmark setup will follow this guideline with additional parts to run the queries on BigQuery as well.

Components

Let’s take a look at AlloyDB components.

We create a cluster that organizes all resources, such as databases, logs, and other metadata.

A cluster contains several nodes, which are virtual machine instances that are dedicated to running the query. AlloyDB organizes nodes into instances. There are two kinds of instances:

Primary instance. Every cluster has one primary instance that provides read/write access. The primary instance has two nodes to provide high availability: an active node and a standby node.
The cluster can optionally have one or more read pool instances that serve read access. AlloyDB automatically load-balances all requests sent to a read pool instance.

Compare Analytical Queries Performance of AlloyDB with BigQuery

Transactional & Analytical Aspects

Two Google Cloud blog articles explain transactional and analytical workload aspects of AlloyDB for PostgreSQL under the hood:

1. Intelligent, database-aware storage

Some of the key highlights from this article are:

Due to the disaggregation of compute and storage, AlloyDB reduces I/O bottlenecks and offloads many database operations to the storage layer through the use of the log processing service.

Log processing service (LPS) is an AlloyDB internal service that processes WAL records and produces database blocks. The compute layer only communicates the WAL records to the storage layer. No block writes are needed.

(Slide from the AlloyDB Partner Office Hours earlier this year)

AlloyDB uses low-latency regional log storage, allowing the system to flush WAL log records quickly. This means transaction commit is a very fast operation.

There are multiple performance optimizations on other parts, such as replication and recovery, which makes the article worth checking. The two points above are highlights closely related to the transactional aspect.

2. Columnar engine

A key highlight from the article is that AlloyDB uses Machine Learning to intelligently select the best tables/columns to keep in columnar format and later automatically maintain this data in memory. Note that on doing analytical benchmarking, we need to execute the queries first as a “warm-up” to let AlloyDB check the workloads. Only after that, we measure the performance.

(Slide from the AlloyDB Partner Office Hours earlier this year)

The columnar engine not only allows aggregation operations directly on the relevant columns without materializing the results of the scan, but it includes several algorithmic optimizations by utilizing column-specific metadata such as min-max values to speed up the scan.

Comparison with BigQuery

When to use

This slide below is a recommendation from the Partner Office Hours on this topic, which we will try to check on the benchmark.

Benchmark

TPCH-H Dataset

We use the TPC-H dataset to do the benchmarking. A TPC-H Benchmark is a transaction processing and database benchmark specific to decision support – i.e., analytics, run and managed by the Transaction Processing Performance Council.

On this TPC-H setup, we will have eight tables whose size can be set and queries that consist of joins and aggregations to represent analytical queries.

You can check TPC-H specification in this official document, specifically on section 2.4 to see what queries being used.

TPCH_SCALE will determine the size of the data across the tables. For example:

- TPCH_SCALE=10 will result in 86,586,082 total row count, ~20GB.

- TPCH_SCALE=30 will result in 259,798,402 total row count, ~60GB.

Benchmark Setup

This experiment will use AlloyDB with this specification:

- database version: PostgreSQL 14

- region: us-central1

- machine type: 16vCPU / 128GB

Both TPCH scale 10 and 30 will fit in the columnar engine memory. The database flags that will be set are:

work_mem = 65536. Increasing the work_mem value can improve the performance of queries that perform a lot of temporary work like sorting, hashing, bitmap, etc. If the AlloyDB instance does not have adequate memory, a very high-value work_mem may cause performance issues. Since we use 128GB memory and the data still fits in, we can max this out to around half the total memory.
google_columnar_engine_enabled = ON
google_columnar_engine.relations = [all TPCH table]. This configuration flag specifies a set of tables and their columns that need to be stored in the columnar format.

The other parts, such as inserting the data and the client machine being used will be covered in the code repository.

Query Runs

Below are 22 analytical queries that were executed on AlloyDB and BigQuery. The query runtimes and the geometric mean are in seconds.

Conclusion

Based on this specific experiment, AlloyDB can perform analytical queries comparable to BigQuery within a certain threshold, in this case, ~60GB of data. In this experiment, all queries were completed in under 1 minute, which is arguably tolerable considering the primary AlloyDB usage should be on transactional workloads. The advertised hybrid transactional and analytical processing (HTAP) of AlloyDB seems promising.

But more than 60GB would be questionable since we need to add more resources to the AlloyDB cluster. The performance would be inferior to BigQuery, where all these queries were completed in under 10 seconds without using table partitioning or clustering.

Value Proposition to Clients

AlloyDB is a good option if the client needs to migrate their database to improve transactional workloads while gaining additional capabilities to handle analytical workloads.
If the data is too large, it would be much better to do ETL/ELT to BigQuery and do the analytical stuff there.

Author

Aliz Team

Company

Subscribe to our newsletter

Subscribe

New opportunities with cloud solutions!

‍Aliz is a proud Google Cloud Partner with specializations in Infrastructure, Data Analytics, Cloud Migration and Machine Learning. We deliver data analytics, machine learning, and infrastructure solutions, off the shelf, or custom-built on GCP using an agile, holistic approach.

Let's talk!