Note that you are viewing Looker documentation. For Looker Studio documentation, visit https://support.google.com/looker-studio.

allow_approximate_optimization

Usage

view: view_name {
  measure: field_name {
    allow_approximate_optimization: yes 
  }
}

Hierarchy

View File

view

Field Type

allow_approximate_optimization

Possible Field Types

Measure

Default Value

no

Accepts

A Boolean (yes or no)

Definition

For dialects that support HyperLogLog sketches, Looker can leverage the HyperLogLog algorithm to approximate distinct counts for aggregate tables.

The allow_approximate_optimization: yes statement enables Looker to store HyperLogLog sketches in aggregate tables, which means that Looker can use approximations for distinct counts for aggregate awareness.

See the Dialect support for distinct counts with aggregate awareness section on this page for the list of dialects that support distinct counts for aggregate tables using HyperLogLog sketches.

In general, distinct counts can't be supported with aggregate awareness because you can't get accurate data if you try to aggregate distinct counts. For example, if you are counting the distinct users on a website, there may be a user who came to the website twice, three weeks apart. If you tried to apply a weekly aggregate table to get a monthly count of distinct users on your website, that user would be counted twice in your monthly distinct count query, and the data would be incorrect.

One workaround for this is to create an aggregate table that exactly matches an Explore query, as described on the Aggregate awareness documentation page. When the Explore query and an aggregate table query are the same, distinct count measures do provide accurate data, so they can be used for aggregate awareness.

The other option is to use approximations for distinct counts. The HyperLogLog algorithm is known to have about a 2% potential error. The allow_approximate_optimization parameter requires your Looker developers to acknowledge that it's okay to use approximate data for the measure so that the measure may be calculated approximately from aggregate tables.

With aggregate awareness, there are two cases where distinct counts come into play:

The first case is with measures of type: count_distinct.
The second case is with measures of type: count that are actually being rendered by Looker as count_distinct measure types. As discussed on the Aggregate awareness documentation page, Looker renders count measures as count_distinct to avoid fanout miscalculations in Explores that join multiple database tables.

In both of these cases, if your dialect supports HyperLogLog sketches, you can add the allow_approximate_optimization: yes statement to measures to enable approximate values. You can then include these measures in aggregate tables.

Even for measures defined with allow_approximate_optimization: yes, Looker will return exact data when possible. For example, if the dimensions in an Explore query are a perfect match of the dimensions in an aggregate table, Looker can provide exact data for distinct counts, without having to approximate. In this case, you will see in the Explore's SQL tab that distinct count measures are being used for aggregate awareness without employing the HyperLogLog algorithm.

Example

The apx_unique_count measure shown in this example is set for allow_approximate_optimization: yes, which means that the measure can be used in an aggregate_table.

measure: apx_unique_count {
  type: count_distinct
    allow_approximate_optimization: yes   # default value is no
  sql: ${id} ;;
}

Dialect support for distinct counts with aggregate awareness

Looker can use distinct counts for aggregate awareness with database dialects that support HyperLogLog sketches. In the latest release of Looker, the following SQL dialects are supported for distinct counts with aggregate awareness:

Dialect	Supported?
Actian Avalanche	No
Amazon Athena	Yes
Amazon Aurora MySQL	No
Amazon Redshift	Yes
Apache Druid	No
Apache Druid 0.13+	No
Apache Druid 0.18+	No
Apache Hive 2.3+	No
Apache Hive 3.1.2+	No
Apache Spark 3+	No
ClickHouse	No
Cloudera Impala 3.1+	No
Cloudera Impala 3.1+ with Native Driver	No
Cloudera Impala with Native Driver	No
DataVirtuality	No
Databricks	No
Denodo 7	No
Denodo 8	No
Dremio	No
Dremio 11+	No
Exasol	No
Firebolt	No
Google BigQuery Legacy SQL	No
Google BigQuery Standard SQL	Yes
Google Cloud PostgreSQL	No
Google Cloud SQL	No
Google Spanner	No
Greenplum	No
HyperSQL	No
IBM Netezza	No
MariaDB	No
Microsoft Azure PostgreSQL	No
Microsoft Azure SQL Database	No
Microsoft Azure Synapse Analytics	No
Microsoft SQL Server 2008+	No
Microsoft SQL Server 2012+	No
Microsoft SQL Server 2016	No
Microsoft SQL Server 2017+	No
MongoBI	No
MySQL	No
MySQL 8.0.12+	No
Oracle	No
Oracle ADWC	No
PostgreSQL 9.5+	No
PostgreSQL pre-9.5	No
PrestoDB	Yes
PrestoSQL	Yes
SAP HANA	No
SAP HANA 2+	No
SingleStore	No
SingleStore 7+	No
Snowflake	Yes
Teradata	No
Trino	Yes
Vector	No
Vertica	No

Check your SQL dialect's documentation to understand the speed and accuracy tradeoffs of this method.