User Guide Getting Started Help Center Documentation Community Training
Looker
  
English
Français
Deutsch
日本語
Aggregate Awareness

Overview

Looker uses aggregate awareness logic to find the smallest, most efficient table available in your database to run a query while still maintaining correctness.

For very large tables in your database, Looker developers can create smaller aggregate tables of data, grouped by various combinations of attributes. The aggregate tables act as roll-ups or summary tables that Looker can use for queries whenever possible, instead of the original large table. When implemented strategically, aggregate awareness can speed up the average query by orders of magnitude.

For example, you might have a petabyte-scale data table with one row for every order that has occurred on your website. From this database, you can create an aggregate table with your daily sales totals. If your website receives 1,000 orders every day, your daily aggregate table would represent each day with 999 fewer rows than your original table. You can create another aggregate table with monthly sales totals that will be even more efficient. So now, if a user runs a query for daily or weekly sales, Looker will use the daily sales total table. If a user runs a query about yearly sales and you don’t have a yearly aggregate table, Looker will use the next best thing, which is the monthly sales aggregate table in this example.

This image shows how Looker answers your users’ questions with aggregate tables whenever possible:

Using aggregate awareness logic, Looker will query the smallest aggregate table possible to answer your users’ questions. The original table would be used only for queries requiring more granularity than the aggregate tables can provide.

Aggregate tables do not need to be joined in or added to a separate Explore. Instead, Looker dynamically adjusts the FROM clause of the Explore query to access the best aggregate table for the query. This means that your drills are maintained and Explores can be consolidated. With aggregate awareness, one Explore can automatically leverage aggregate tables but still dive deep into granular data if needed.

You can also leverage aggregate tables to drastically improve the performance of dashboards, especially for tiles that query huge data sets. For details, see the Getting Aggregate Table LookML from a Dashboard section on the aggregate_table documentation page.

Adding Aggregate Tables to Your Project

Looker developers can create strategic aggregate tables that will minimize the number of queries required on the large tables in a database.

An aggregate table is defined using the aggregate_table parameter under an explore parameter in your LookML project. The aggregate table query results are then saved to your database, much like a persistent derived table (PDT).

Here is an example of an aggregate table in LookML:

explore: orders { label: "Sales Totals" aggregate_table: sales_monthly { materialization: { datagroup_trigger: orders_datagroup } query: { dimensions: [created_month] measures: [order_items.total_sales] timezone: America/Los_Angeles } } # other explore parameters }

To create an aggregate table, you can write the LookML from scratch, or you can get aggregate table LookML from an Explore or from a dashboard. See the aggregate_table parameter page for the specifics of the aggregate_table parameter and its subparameters.

Designing Aggregate Tables

Looker can use an aggregate table for an Explore query if all of the following are true:

The following sections provide additional tips for ensuring that your aggregate tables are used for queries:

Timeframe Considerations

Looker’s aggregate awareness logic is able to derive one timeframe from another. For example, an aggregate table based on daily data can be used for an Explore query that calls for other timeframes, such as queries for daily, monthly, and yearly data, or even day-of-month, day-of-year, and week-of-year data. An aggregate table can be used for a query as long as the aggregate table’s timeframe has a higher granularity than the Explore query (a yearly aggregate table can’t be used for an Explore query for hourly data).

Looker’s aggregate awareness logic is able to derive one timeframe from another. An aggregate table can be used for a query as long as the aggregate table’s timeframe has a higher granularity than the Explore query. For example, an aggregate table based on daily data can be used for an Explore query that calls for other timeframes, such as queries for daily, monthly, and yearly data, or even day-of-month, day-of-year, and week-of-year data. But a yearly aggregate table can’t be used for an Explore query that calls for hourly data, since the aggregate table’s data doesn’t have enough granularity for the Explore query.

The same applies to time range subsets. For example, if you have an aggregate table that is filtered for the last three months and a user queries the data with a filter for the last two months, Looker will be able to use the aggregate table for that query.

Time Zone Considerations

In many cases, database admins use UTC as the time zone for databases. However, many users might not be in the UTC time zone. Looker has multiple options for converting time zones so that your users will get query results in their own time zone:

See the Using Time Zone Settings documentation page for more information on these options.

These concepts are important for understanding aggregate awareness because, in order for an aggregate table to be used for a query with date dimensions or date filters, the time zone on the aggregate table must match the time zone setting used for the original query.

Aggregate tables use the database time zone if no timezone value is specified. Your database connection will also use the database time zone if any of the following are true:

If any of these are true, you can omit the timezone parameter for your aggregate tables.

Otherwise, the aggregate table time zone should be defined to match possible queries so that the aggregate table is more likely to be used:

Filter Considerations

Be careful when including filters in your aggregate table. Filters on an aggregate table can narrow the results to the point where the aggregate table cannot be used. For example, say that you create an aggregate table for daily order counts, and the aggregate table filters for just sunglass orders coming from Australia. If a user runs an Explore query for daily order counts of sunglasses worldwide, Looker cannot use the aggregate table for that Explore query, since the aggregate table only has the data for Australia. The aggregate table filters the data too narrowly to be used by the Explore query.

Also, be conscious of the filters that your Looker developers might have built into your Explore, such as:

These filter types are based on specific fields. If your Explore has these filters, you must include their fields in the dimensions parameter of the aggregate_table.

For example, here is an Explore with an access filter that is based on the orders.region field:

explore: orders { access_filter: { field: orders.region user_attribute: region } }

To create an aggregate table that would be used for this Explore, the aggregate table must include the field on which the access filter is based. In the example below, the access filter is based on the field orders.region, and this same field is included as a dimension in the aggregate table:

explore: orders { access_filter: { field: orders.region # <-- orders.region field user_attribute: region } aggregate_table: sales_monthly { materialization: { datagroup_trigger: orders_datagroup } query: { dimensions: [orders.created_day, orders.region] # <-- orders.region field measures: [orders.total_sales] timezone: America/Los_Angeles } } }

Because the aggregate table query includes the orders.region dimension, Looker can dynamically filter the data in the aggregate table to match the filter from the Explore query. Therefore, Looker can still use the aggregate table for the Explore’s queries, even though the Explore has an access filter.

Field Considerations

To be used for an Explore query, an aggregate table must have all the dimensions and measures needed for that Explore query, including the fields used for filters in the Explore query. If an Explore query contains a dimension or measure that is not in an aggregate table, Looker can’t use the aggregate table and will use the base table instead.

For example, if a query groups by dimensions A and B, aggregates by measure C, and filters on dimension D, then the aggregate table must minimally have A, B, and D as dimensions and C as a measure. The aggregate table can have other fields as well, but it must have at least these fields in order to be viable for optimization. The one exception is timeframe dimensions, since lower granularity timeframes can be derived from higher granularity ones.

Because of these field considerations, an aggregate table is specific to the Explore under which it’s defined. An aggregate table defined under one Explore won’t be used for queries on a different Explore.

Measure Type Considerations

Aggregate awareness can be used for Explore queries that use additive measures, which are these measure types:

If an Explore query uses any other type of measure, Looker will use the original table, not the aggregate table, to return results.

The only exception is if the Explore query is an exact match of an aggregate table query. In that case, Looker can use an aggregate table that includes non-additive measure types.

This makes sense because you can’t get accurate data if you try to aggregate aggregates. For example, if you are counting the distinct users on a website, there may be a user that came to the website twice, three weeks apart. If you tried to apply a weekly aggregate table to get a monthly count of distinct users on your website, that user would be counted twice in your monthly distinct count query, and the data would be incorrect. However, if the Explore query is an exact match of an aggregate table query, you can use distinct counts. So you could use your weekly distinct aggregate table to answer an Explore query about weekly distinct users.

Symmetric Aggregates for Explores with Joins

One important thing to note is that in an Explore that joins multiple database tables, Looker can render an additive measure as non-additive. For example, a count measure is rendered as a count_distinct measure type. This is to avoid fanout miscalculations for joins, and it is part of Looker’s symmetric aggregates functionality. See the Help Center article on symmetric aggregates for an explanation of this feature of Looker.

The symmetric aggregates functionality prevents miscalculations, but it can also prevent your aggregate tables from being used in certain cases, so it is important to understand.

For the measure types supported by aggregate awareness, this applies to sum, count, and average. Looker will render these types of measures as non-additive if:

See the relationship parameter documentation page for an explanation of these types of joins.

If you find that your aggregate table isn’t being used for this reason, you can create an aggregate table to exactly match an Explore query in order to use these measure types for an Explore with joins.

Determining Which Aggregate Table Is Used for a Query

If you are in Development Mode, you can run queries on the Explore to test if Looker will use your aggregate tables to get the results.

For example, based on the example monthly aggregate table above, we can go to the Explore and run a query for yearly sales totals. Then we can click on the SQL tab to see the details of the query that Looker created. If you are in Development Mode, Looker shows comments to indicate the aggregate table it used for the query:

From the comments on the SQL tab, we can see that Looker is using the sales_monthly aggregate table for this query:

-- Use existing orders::sales_monthly in sandbox_scratch.LR$LBJQY1588205086507_orders$sales_monthly -- Did not use orders::sales_weekly; orders::sales_monthly was a better fit for optimization. -- Did not use orders::sales_daily; orders::sales_monthly was a better fit for optimization.

Looker also provides information about why other aggregate tables weren’t used for the query. In this case, the sales_monthly aggregate table was more optimal than the sales_weekly and sales_daily aggregate tables.

See the Troubleshooting section below for possible comments you may see in the SQL tab and suggestions for how to resolve them.

Looker Unions New Data to Your Aggregate Tables

For aggregate tables with time filters, Looker can union fresh data into your aggregate table. You might have an aggregate table that includes data for the past three days, but that aggregate table might have been built yesterday. The aggregate table would be missing today’s information, so you wouldn’t expect to use it for an Explore query on the most recently daily information.

However, Looker still can use the data in that aggregate table for the query because Looker will run a query on the most recent data, then union those results into the results in the aggregate table.

Looker can union fresh data with your aggregate table’s data given the following circumstances:

For example, the following aggregate table has a dimension based on the orders.created_date field, and has a time filter ("3 days") based on the same field:

aggregate_table: sales_last_3_days { query: { dimensions: [orders.created_date] measures: [order_items.total_sales] filters: [orders.created_date: "3 days"] # <-- time filter timezone: America/Los_Angeles } … }

If this aggregate table was built yesterday, Looker will retrieve the most recent data that is not yet included in the aggregate table, then union the fresh results with the results from the aggregate table. This means your users will get the latest data while still optimizing performance using aggregate awareness.

If you’re in Development Mode, you can click the SQL tab of an Explore to see the aggregate table that Looker used for the query, and the UNION statement that Looker used to bring in newer data that wasn’t included in the aggregate table.

Currently, if Looker is unable to union fresh data with your aggregate table, Looker will use the existing data from the aggregate table.

Troubleshooting

As described in the Determining Which Aggregate Table Is Used for a Query section, if you’re in Development Mode, you can run queries on the Explore and click the SQL tab to see comments about the aggregate table used for the query, if any.

The SQL tab also includes comments about why aggregate tables were not used for a query, if that is the case. For aggregate tables that aren’t used, the comment will begin with:

Did not use [explore name]::[aggregate table name];

For example, here is a comment about why the sales_daily aggregate table defined in the order_items Explore wasn’t used for a query:

-- Did not use order_items::sales_daily; query contained the following filters that were neither included as fields nor exactly matched by filters in the aggregate table: order_items.created_year.

In this case, the filters in the query prevented the aggregate table from being used.

The following table shows some other possible reasons that an aggregate table can’t be used, along with steps you can take to increase the usability of the aggregate table.

Reason for Not Using
the Aggregate Table
Explanation and Possible Steps
No such field in the Explore. There is a LookML validation type error. This is mostly likely because the aggregate table was not properly defined, or there was a typo in the LookML for your aggregate table. A likely culprit is an incorrect field name, or the like.

To resolve this, verify that the dimensions and measures in the aggregate table match the field names in the Explore. See the aggregate_table documentation page for more information on how to define an aggregate table.
The aggregate table does not include the following fields in the query. To be used for an Explore query, an aggregate table must have all the dimensions and measures needed for that Explore query, including the fields used for filters in the Explore query. If an Explore query contains a dimension or measure that is not in an aggregate table, Looker can’t use the aggregate table and will use the base table instead. See the Field Considerations section above for details. The one exception is timeframe dimensions, since lower granularity timeframes can be derived from higher granularity ones.

To resolve this, verify that the Explore query’s fields are included in the aggregate table definition.
The query contained the following filters that were neither included as fields nor exactly matched by filters in the aggregate table. The filters in the Explore query prevent Looker from using the aggregate table.

To resolve this, you can do one of the following:
  • Add the same filter to your aggregate table.
  • Add the field that the filter uses to your aggregate table.
  • Remove the filters from the Explore query.
See the Filter Considerations section above for details.
The query contains the following measures that cannot roll up. The query contains one or more measure types that aren’t supported for aggregate awareness, such as distinct count, median, or percentile.

To resolve this, check the type of each measure in the query and make sure they are supported measure types. Also, if your Explore has joins, verify that your measures aren’t converted to distinct measures (symmetric aggregates) through fanned out joins. See the Symmetric Aggregates for Explores with Joins section above for an explanation.
A different aggregate table was a better fit for optimization. There were multiple viable aggregate tables for the query and Looker found a more optimal aggregate table to use instead. Nothing needs to be done in this case.
Looker did not do any grouping (because of a primary_key or cancel_grouping_fields parameter) and therefore the query can’t be rolled up. The query references a dimension that prevents it from having a GROUP BY clause, and therefore Looker cannot use any aggregate table for the query.

To resolve this, verify that the view’s primary_key parameter and the Explore’s cancel_grouping_fields parameter are set up correctly.
The aggregate table contained filters not in the query. The aggregate table has a non-time filter that is not in the query.

To resolve this, you can remove the filter from the aggregate table. See the Filter Considerations section above for details.
The optimizer cannot determine why the aggregate table wasn’t used. This comment is reserved for corner cases. If you see this for an Explore query that is used often, you can create an aggregate table that exactly matches the Explore query. You can easily get aggregate table LookML from an Explore, as described on the aggregate_table parameter page.

Dialect Support for Aggregate Awareness

The ability to use aggregate awareness depends on the database dialect your Looker connection is using. In the most current Looker release, the following dialects support aggregate awareness:

Top