home User Guide Getting Started Help Center Documentation Community Training Certification
menu
close
settings
Looker keyboard_arrow_down
language keyboard_arrow_down
English
Français
Deutsch
日本語
search
print
Aggregate Awareness

Overview

Looker uses aggregate awareness logic to find the smallest, most efficient table available in your database to run a query while still maintaining correctness.

For very large tables in your database, Looker developers can create smaller aggregate tables of data, grouped by various combinations of attributes. The aggregate tables act as roll-ups or summary tables that Looker can use for queries whenever possible, instead of the original large table. When implemented strategically, aggregate awareness can speed up the average query by orders of magnitude.

For example, you might have a petabyte-scale data table with one row for every order that has occurred on your website. From this database, you can create an aggregate table with your daily sales totals. If your website receives 1,000 orders every day, your daily aggregate table would represent each day with 999 fewer rows than your original table. You can create another aggregate table with monthly sales totals that will be even more efficient. So now, if a user runs a query for daily or weekly sales, Looker will use the daily sales total table. If a user runs a query about yearly sales and you don’t have a yearly aggregate table, Looker will use the next best thing, which is the monthly sales aggregate table in this example.

This image shows how Looker answers your users’ questions with aggregate tables whenever possible:

Using aggregate awareness logic, Looker will query the smallest aggregate table possible to answer your users’ questions. The original table would be used only for queries requiring finer granularity than the aggregate tables can provide.

Aggregate tables do not need to be joined in or added to a separate Explore. Instead, Looker dynamically adjusts the FROM clause of the Explore query to access the best aggregate table for the query. This means that your drills are maintained and Explores can be consolidated. With aggregate awareness, one Explore can automatically leverage aggregate tables but still dive deep into granular data if needed.

You can also leverage aggregate tables to drastically improve the performance of dashboards, especially for tiles that query huge data sets. For details, see the Getting Aggregate Table LookML from a Dashboard section on the aggregate_table documentation page.

Adding Aggregate Tables to Your Project

Looker developers can create strategic aggregate tables that will minimize the number of queries required on the large tables in a database.

An aggregate table is defined using the aggregate_table parameter under an explore parameter in your LookML project. The aggregate table query results are then saved to your database, much like a persistent derived table (PDT).

Here is an example of an aggregate table in LookML:

explore: orders { label: "Sales Totals" aggregate_table: sales_monthly { materialization: { datagroup_trigger: orders_datagroup } query: { dimensions: [created_month] measures: [order_items.total_sales] timezone: America/Los_Angeles } } # other explore parameters }

To create an aggregate table, you can write the LookML from scratch, or you can get aggregate table LookML from an Explore or from a dashboard. See the aggregate_table parameter page for the specifics of the aggregate_table parameter and its subparameters.

Designing Aggregate Tables

For an Explore query to use an aggregate table, the aggregate table must be able to provide accurate data for the Explore query. Looker can use an aggregate table for an Explore query if all the following are true:

One way to ensure that an aggregate table can provide accurate data for an Explore query is to simply create an aggregate table that exactly matches an Explore query. See the section Creating Aggregate Tables That Exactly Match Explore Queries for details.

Field Factors

To be used for an Explore query, an aggregate table must have all the dimensions and measures needed for that Explore query, including the fields used for filters in the Explore query. If an Explore query contains a dimension or measure that is not in an aggregate table, Looker can’t use the aggregate table and will use the base table instead.

For example, if a query groups by dimensions A and B, aggregates by measure C, and filters on dimension D, then the aggregate table must minimally have A, B, and D as dimensions and C as a measure.

The aggregate table can have other fields as well, but it must have at least the Explore query fields in order to be viable for optimization. The one exception is timeframe dimensions, since coarser granularity timeframes can be derived from finer granularity ones.

Because of these field considerations, an aggregate table is specific to the Explore under which it’s defined. An aggregate table defined under one Explore won’t be used for queries on a different Explore.

Timeframe Factors

Looker’s aggregate awareness logic is able to derive one timeframe from another. An aggregate table can be used for a query as long as the aggregate table’s timeframe has a finer (or equal) granularity as the Explore query. For example, an aggregate table based on daily data can be used for an Explore query that calls for other timeframes, such as queries for daily, monthly, and yearly data, or even day-of-month, day-of-year, and week-of-year data. But a yearly aggregate table can’t be used for an Explore query that calls for hourly data, since the aggregate table’s data doesn’t have fine enough granularity for the Explore query.

The same applies to time range subsets. For example, if you have an aggregate table that is filtered for the last three months and a user queries the data with a filter for the last two months, Looker will be able to use the aggregate table for that query.

In addition, the same logic applies for queries with timeframe filters: an aggregate table can be used for a query with a timeframe filter as long as the aggregate table’s timeframe has a finer (or equal) granularity as the timeframe filter used in the Explore query. For example, an aggregate table that has a daily timeframe dimension can be used for an Explore query that filters on day, week, or month.

Measure Type Factors

For an Explore query to use an aggregate table, the measures in the aggregate table must be able to provide accurate data for the Explore query.

For this reason, only certain types of measures are supported, as described in the following sections:

If an Explore query uses any other type of measure, Looker will use the original table, not the aggregate table, to return results. The only exception is if the Explore query is an exact match of an aggregate table query, as described in the section Creating Aggregate Tables That Exactly Match Explore Queries.

Otherwise, Looker will use the original table, not the aggregate table, to return results.

Measures with Supported Measure Types

Aggregate awareness can be used for Explore queries that use measures with these measure types:

If an Explore joins multiple database tables, Looker can render measures of type SUM, COUNT, and AVERAGE as SUM DISTINCT, COUNT DISTINCT, and AVERAGE DISTINCT, respectively. Looker does this in order to avoid fanout miscalculations, as described in the Symmetric Aggregates for Explores with Joins section below.

To use an aggregate table for an Explore query, Looker must be able to operate on the aggregate table’s measures to provide accurate data in the Explore query. For example, a measure with type: sum can be used for aggregate awareness because you can sum several sums: An aggregate table of weekly sums can be added together to get an accurate monthly sum. Similarly, a measure with type: max can be used because an aggregate table of daily maximums can be used to find the accurate weekly maximum.

In the case of measures with type: average, aggregate awareness is supported because Looker uses sum and count data to accurately derive average values from aggregate tables.

Measures Defined with SQL Expressions

Aggregate awareness can also be used with measures that are defined with expressions in the sql parameter. When defined with SQL expressions, the following measure types are also supported:

Aggregate awareness is supported for measures that are defined as combinations of other measures, such as this example:

measure: total_revenue_in_dollars { type: number sql: ${total_revenue_in_dollars} - ${inventory_item.total_cost_in_dollars} ;; }

Aggregate awareness is also supported for measures where calculations are defined in the sql parameter, such as this measure:

measure: wholesale_value { type: number sql: (${order_items.total_sale_price} * 0.60) ;; }

And aggregate awareness is supported for measures where MIN, MAX, COUNT, SUM operations are defined in the sql parameter, such as this measure:

measure: most_recent_order_date { type: date sql: MAX(${users.created_at_raw}) }

Aggregate awareness is supported only for MIN(), MAX(), COUNT(), and SUM() operations. If you want to use an average measure in your aggregate table, you can create a measure of type: average, which is supported for aggregate awareness.

When sql expressions are used in measures, aggregate awareness supports the following types of field references:

Aggregate awareness is not supported for measures that are defined with references using the ${TABLE}.column_name format, which indicate a column in a table.

For example, aggregate awareness is supported for a measure defined like this, which references a field in another view:

measure: wholesale_value { type: number sql: (${order_items.total_sale_price} * 0.60) ;; }

However, aggregate awareness is not supported for a measure defined like this, which references a table column:

measure: wholesale_value { type: number sql: (${TABLE}.total_sale_price * 0.60) ;; }

See the Incorporating SQL and Referring to LookML Objects documentation page for an overview of using references in LookML.

Measures That Approximate Distinct Counts

In general, distinct counts aren’t supported with aggregate awareness because you can’t get accurate data if you try to aggregate distinct counts. For example, if you are counting the distinct users on a website, there may be a user who came to the website twice, three weeks apart. If you tried to apply a weekly aggregate table to get a monthly count of distinct users on your website, that user would be counted twice in your monthly distinct count query, and the data would be incorrect.

One workaround for this is to create an aggregate table that exactly matches an Explore query, as described in the Creating Aggregate Tables That Exactly Match Explore Queries section below. When the Explore query and an aggregate table query are the same, distinct count measures do provide accurate data, so they can be used for aggregate awareness.

Another option is to use approximations for distinct counts. For dialects that support HyperLogLog sketches, Looker can leverage the HyperLogLog algorithm to approximate distinct counts for aggregate tables.

The HyperLogLog algorithm is known to have about a 2% error. The allow_approximate_optimization: yes parameter requires your Looker developers to acknowledge that it’s okay to use approximate data for the measure so that the measure may be calculated approximately from aggregate tables.

See the allow_approximate_optimization documentation page for more information and for the list of dialects that support count distinct using HyperLogLog.

Time Zone Factors

In many cases, database admins use UTC as the time zone for databases. However, many users might not be in the UTC time zone. Looker has multiple options for converting time zones so that your users will get query results in their own time zone:

See the Using Time Zone Settings documentation page for more information on these options.

These concepts are important for understanding aggregate awareness because, in order for an aggregate table to be used for a query with date dimensions or date filters, the time zone on the aggregate table must match the time zone setting used for the original query.

Aggregate tables use the database time zone if no timezone value is specified. Your database connection will also use the database time zone if any of the following are true:

If any of these are true, you can omit the timezone parameter for your aggregate tables.

Otherwise, the aggregate table’s time zone should be defined to match possible queries so that the aggregate table is more likely to be used:

Looker cannot use an aggregate table for an Explore query if the time zone on the aggregate table does not match the time zone in the Explore query. If your database connection has user-specific time zones, that means that you need a separate aggregate table for each of your user’s time zones.

Filter Factors

Be careful when including filters in your aggregate table. Filters on an aggregate table can narrow the results to the point where the aggregate table cannot be used. For example, say that you create an aggregate table for daily order counts, and the aggregate table filters for just sunglass orders coming from Australia. If a user runs an Explore query for daily order counts of sunglasses worldwide, Looker cannot use the aggregate table for that Explore query, since the aggregate table only has the data for Australia. The aggregate table filters the data too narrowly to be used by the Explore query.

Also, be conscious of the filters that your Looker developers might have built into your Explore, such as:

These filter types are based on specific fields. If your Explore has these filters, you must include their fields in the dimensions parameter of the aggregate_table.

For example, here is an Explore with an access filter that is based on the orders.region field:

explore: orders { access_filter: { field: orders.region user_attribute: region } }

To create an aggregate table that would be used for this Explore, the aggregate table must include the field on which the access filter is based. In the example below, the access filter is based on the field orders.region, and this same field is included as a dimension in the aggregate table:

explore: orders { access_filter: { field: orders.region # <-- orders.region field user_attribute: region } aggregate_table: sales_monthly { materialization: { datagroup_trigger: orders_datagroup } query: { dimensions: [orders.created_day, orders.region] # <-- orders.region field measures: [orders.total_sales] timezone: America/Los_Angeles } } }

Because the aggregate table query includes the orders.region dimension, Looker can dynamically filter the data in the aggregate table to match the filter from the Explore query. Therefore, Looker can still use the aggregate table for the Explore’s queries, even though the Explore has an access filter.

Creating Aggregate Tables That Exactly Match Explore Queries

One way to be sure that an aggregate table can be used for an Explore query is to simply create an aggregate table that exactly matches the Explore query. If the Explore query and the aggregate table both use the same measures, dimensions, filters, timezones, and other parameters, then by definition the aggregate table’s results will apply to the Explore query. If an aggregate table is an exact match of an Explore query, Looker is able to use aggregate tables that include any type of measure.

You can create an aggregate table from an Explore using the Get LookML option from the gear menu of an Explore. You can also create exact matches for all the tiles in a dashboard using the Get LookML option from the gear menu of a dashboard.

Determining Which Aggregate Table Is Used for a Query

If you are in Development Mode, you can run queries on the Explore to test if Looker will use your aggregate tables to get the results.

For example, based on the example monthly aggregate table above, we can go to the Explore and run a query for yearly sales totals. Then we can click on the SQL tab to see the details of the query that Looker created. If you are in Development Mode, Looker shows comments to indicate the aggregate table it used for the query:

From the comments on the SQL tab, we can see that Looker is using the sales_monthly aggregate table for this query:

-- Use existing orders::sales_monthly in sandbox_scratch.LR$LBJQY1588205086507_orders$sales_monthly -- Did not use orders::sales_weekly; orders::sales_monthly was a better fit for optimization. -- Did not use orders::sales_daily; orders::sales_monthly was a better fit for optimization.

Looker also provides information about why other aggregate tables weren’t used for the query. In this case, the sales_monthly aggregate table was more optimal than the sales_weekly and sales_daily aggregate tables.

See the Troubleshooting section below for possible comments you may see in the SQL tab and suggestions for how to resolve them.

Looker Unions New Data to Your Aggregate Tables

For aggregate tables with time filters, Looker can union fresh data into your aggregate table. You might have an aggregate table that includes data for the past three days, but that aggregate table might have been built yesterday. The aggregate table would be missing today’s information, so you wouldn’t expect to use it for an Explore query on the most recently daily information.

However, Looker still can use the data in that aggregate table for the query because Looker will run a query on the most recent data, then union those results into the results in the aggregate table.

Looker can union fresh data with your aggregate table’s data given the following circumstances:

For example, the following aggregate table has a dimension based on the orders.created_date field, and has a time filter ("3 days") based on the same field:

aggregate_table: sales_last_3_days { query: { dimensions: [orders.created_date] measures: [order_items.total_sales] filters: [orders.created_date: "3 days"] # <-- time filter timezone: America/Los_Angeles } … }

If this aggregate table was built yesterday, Looker will retrieve the most recent data that is not yet included in the aggregate table, then union the fresh results with the results from the aggregate table. This means your users will get the latest data while still optimizing performance using aggregate awareness.

If you’re in Development Mode, you can click the SQL tab of an Explore to see the aggregate table that Looker used for the query, and the UNION statement that Looker used to bring in newer data that wasn’t included in the aggregate table.

Currently, if Looker is unable to union fresh data with your aggregate table, Looker will use the existing data from the aggregate table.

Aggregate Tables Must Be Persisted

To be accessible for aggregate awareness, your aggregate table must be persisted in your database. The persistence strategy is specified in the aggregate table’s materialization parameter.

A user with develop permission can override the persistence settings and rebuild all the aggregate tables for a query to get the most up-to-date data. To rebuild the tables for a query, use the Rebuild Derived Tables & Run option from the Explore’s menu:

You must wait for the Explore query to load before this option is available.

This option will rebuild all persistent tables (all aggregate tables and PDTs) referenced in the query. In addition, this also rebuilds all the persistent tables that depend upon those aggregate tables and PDTs.

For the user who initiates the Rebuild Derived Tables & Run option, the query will wait for the tables to rebuild before loading results. Other users’ queries will still use the existing tables. Once the persistent tables are rebuilt, then all users will use the rebuilt tables.

Troubleshooting

As described in the Determining Which Aggregate Table Is Used for a Query section, if you’re in Development Mode, you can run queries on the Explore and click the SQL tab to see comments about the aggregate table used for the query, if any.

The SQL tab also includes comments about why aggregate tables were not used for a query, if that is the case. For aggregate tables that aren’t used, the comment will begin with:

Did not use [explore name]::[aggregate table name];

For example, here is a comment about why the sales_daily aggregate table defined in the order_items Explore wasn’t used for a query:

-- Did not use order_items::sales_daily; query contained the following filters that were neither included as fields nor exactly matched by filters in the aggregate table: order_items.created_year.

In this case, the filters in the query prevented the aggregate table from being used.

The following table shows some other possible reasons that an aggregate table can’t be used, along with steps you can take to increase the usability of the aggregate table.

Reason for Not Using
the Aggregate Table
Explanation and Possible Steps
No such field in the Explore. There is a LookML validation type error. This is mostly likely because the aggregate table was not properly defined, or there was a typo in the LookML for your aggregate table. A likely culprit is an incorrect field name, or the like.

To resolve this, verify that the dimensions and measures in the aggregate table match the field names in the Explore. See the aggregate_table documentation page for more information on how to define an aggregate table.
The aggregate table does not include the following fields in the query. To be used for an Explore query, an aggregate table must have all the dimensions and measures needed for that Explore query, including the fields used for filters in the Explore query. If an Explore query contains a dimension or measure that is not in an aggregate table, Looker can’t use the aggregate table and will use the base table instead. See the Field Factors section above for details. The one exception is timeframe dimensions, since coarser granularity timeframes can be derived from finer granularity ones.

To resolve this, verify that the Explore query’s fields are included in the aggregate table definition.
The query contained the following filters that were neither included as fields nor exactly matched by filters in the aggregate table. The filters in the Explore query prevent Looker from using the aggregate table.

To resolve this, you can do one of the following:
  • Add the same filter to your aggregate table.
  • Add the field that the filter uses to your aggregate table.
  • Remove the filters from the Explore query.
See the Filter Factors section above for details.
The query contains the following measures that cannot roll up. The query contains one or more measure types that aren’t supported for aggregate awareness, such as distinct count, median, or percentile.

To resolve this, check the type of each measure in the query and make sure it is one of the supported measure types. Also, if your Explore has joins, verify that your measures aren’t converted to distinct measures (symmetric aggregates) through fanned out joins. See the Symmetric Aggregates for Explores with Joins section above for an explanation.
A different aggregate table was a better fit for optimization. There were multiple viable aggregate tables for the query and Looker found a more optimal aggregate table to use instead. Nothing needs to be done in this case.
Looker did not do any grouping (because of a primary_key or cancel_grouping_fields parameter) and therefore the query can’t be rolled up. The query references a dimension that prevents it from having a GROUP BY clause, and therefore Looker cannot use any aggregate table for the query.

To resolve this, verify that the view’s primary_key parameter and the Explore’s cancel_grouping_fields parameter are set up correctly.
The aggregate table contained filters not in the query. The aggregate table has a non-time filter that is not in the query.

To resolve this, you can remove the filter from the aggregate table. See the Filter Factors section above for details.
The optimizer cannot determine why the aggregate table wasn’t used. This comment is reserved for corner cases. If you see this for an Explore query that is used often, you can create an aggregate table that exactly matches the Explore query. You can easily get aggregate table LookML from an Explore, as described on the aggregate_table parameter page.

Things to Consider

Symmetric Aggregates for Explores with Joins

One important thing to note is that in an Explore that joins multiple database tables, Looker can render measures of type SUM, COUNT, and AVERAGE as SUM DISTINCT, COUNT DISTINCT, and AVERAGE DISTINCT, respectively. Looker does this in order to avoid fanout miscalculations. For example, a count measure is rendered as a count_distinct measure type. This is to avoid fanout miscalculations for joins, and it is part of Looker’s symmetric aggregates functionality. See the Help Center article on symmetric aggregates for an explanation of this feature of Looker.

The symmetric aggregates functionality prevents miscalculations, but it can also prevent your aggregate tables from being used in certain cases, so it is important to understand.

For the measure types supported by aggregate awareness, this applies to sum, count, and average. Looker will render these types of measures as DISTINCT if:

See the relationship parameter documentation page for an explanation of these types of joins.

If you find that your aggregate table isn’t being used for this reason, you can create an aggregate table to exactly match an Explore query in order to use these measure types for an Explore with joins. See the Creating Aggregate Tables That Exactly Match Explore Queries section on this page for more information.

Also, if you have a SQL dialect that supports HyperLogLog sketches, you can add the allow_approximate_optimization: yes parameter to the measure. When a count measure is defined with allow_approximate_optimization: yes, Looker can use the measure for aggregate awareness, even if it renders as a distinct count.

See the allow_approximate_optimization documentation page for details, and for a list of which SQL dialects support HyperLogLog sketches.

Dialect Support for Aggregate Awareness

The ability to use aggregate awareness depends on the database dialect your Looker connection is using. In Looker 7.18, the following dialects support aggregate awareness:

Google BigQuery Legacy SQL supports Aggregate Awareness, but does not support unioning fresh data with your aggregate table’s data.

Top