allow_approximate_optimization

Usage

view: view_name {
  measure: field_name {
    allow_approximate_optimization: yes 
  }
}
Hierarchy
allow_approximate_optimization
Possible Field Types
Measure

Default Value
no

Accepts
A Boolean (yes or no)

Definition

For dialects that support HyperLogLog sketches, Looker can leverage the HyperLogLog algorithm to approximate distinct counts for aggregate tables.

The allow_approximate_optimization: yes statement enables Looker to store HyperLogLog sketches in aggregate tables, which means that Looker can use approximations for distinct counts for aggregate awareness.

See the Dialect support for distinct counts with aggregate awareness section on this page for the list of dialects that support distinct counts for aggregate tables using HyperLogLog sketches.

In general, distinct counts can't be supported with aggregate awareness because you can't get accurate data if you try to aggregate distinct counts. For example, if you are counting the distinct users on a website, there may be a user who came to the website twice, three weeks apart. If you tried to apply a weekly aggregate table to get a monthly count of distinct users on your website, that user would be counted twice in your monthly distinct count query, and the data would be incorrect.

One workaround for this is to create an aggregate table that exactly matches an Explore query, as described on the Aggregate awareness documentation page. When the Explore query and an aggregate table query are the same, distinct count measures do provide accurate data, so they can be used for aggregate awareness.

The other option is to use approximations for distinct counts. The HyperLogLog algorithm is known to have about a 2% potential error. The allow_approximate_optimization parameter requires your Looker developers to acknowledge that it's okay to use approximate data for the measure so that the measure may be calculated approximately from aggregate tables.

With aggregate awareness, there are two cases where distinct counts come into play:

  • The first case is with measures of type: count_distinct.
  • The second case is with measures of type: count that are actually being rendered by Looker as count_distinct measure types. As discussed on the Aggregate awareness documentation page, Looker renders count measures as count_distinct to avoid fanout miscalculations in Explores that join multiple database tables.

In both of these cases, if your dialect supports HyperLogLog sketches, you can add the allow_approximate_optimization: yes statement to measures to enable approximate values. You can then include these measures in aggregate tables.

Even for measures defined with allow_approximate_optimization: yes, Looker will return exact data when possible. For example, if the dimensions in an Explore query are a perfect match of the dimensions in an aggregate table, Looker can provide exact data for distinct counts, without having to approximate. In this case, you will see in the Explore's SQL tab that distinct count measures are being used for aggregate awareness without employing the HyperLogLog algorithm.

Example

The apx_unique_count measure shown in this example is set for allow_approximate_optimization: yes, which means that the measure can be used in an aggregate_table.

measure: apx_unique_count {
  type: count_distinct
    allow_approximate_optimization: yes   # default value is no
  sql: ${id} ;;
}

Dialect support for distinct counts with aggregate awareness

Looker can use distinct counts for aggregate awareness with database dialects that support HyperLogLog sketches. In the latest release of Looker, the following SQL dialects are supported for distinct counts with aggregate awareness:

Dialect Supported?
Actian Avalanche
No
Amazon Athena
Yes
Amazon Aurora MySQL
No
Amazon Redshift
Yes
Apache Druid
No
Apache Druid 0.13+
No
Apache Druid 0.18+
No
Apache Hive 2.3+
No
Apache Hive 3.1.2+
No
Apache Spark 3+
No
ClickHouse
No
Cloudera Impala 3.1+
No
Cloudera Impala 3.1+ with Native Driver
No
Cloudera Impala with Native Driver
No
DataVirtuality
No
Databricks
No
Denodo 7
No
Denodo 8
No
Dremio
No
Dremio 11+
No
Exasol
No
Firebolt
No
Google BigQuery Legacy SQL
No
Google BigQuery Standard SQL
Yes
Google Cloud PostgreSQL
No
Google Cloud SQL
No
Google Spanner
No
Greenplum
No
HyperSQL
No
IBM Netezza
No
MariaDB
No
Microsoft Azure PostgreSQL
No
Microsoft Azure SQL Database
No
Microsoft Azure Synapse Analytics
No
Microsoft SQL Server 2008+
No
Microsoft SQL Server 2012+
No
Microsoft SQL Server 2016
No
Microsoft SQL Server 2017+
No
MongoBI
No
MySQL
No
MySQL 8.0.12+
No
Oracle
No
Oracle ADWC
No
PostgreSQL 9.5+
No
PostgreSQL pre-9.5
No
PrestoDB
Yes
PrestoSQL
Yes
SAP HANA
No
SAP HANA 2+
No
SingleStore
No
SingleStore 7+
No
Snowflake
Yes
Teradata
No
Trino
Yes
Vector
No
Vertica
No

Check your SQL dialect's documentation to understand the speed and accuracy tradeoffs of this method.