Looker uses aggregate awareness logic to find the smallest, most efficient table available in your database to run a query while still maintaining correctness.
For very large tables in your database, Looker developers can create smaller aggregate tables of data, grouped by various combinations of attributes. The aggregate tables act as roll-ups or summary tables that Looker can use for queries whenever possible, instead of the original large table. When implemented strategically, aggregate awareness can speed up the average query by orders of magnitude.
For example, you might have a petabyte-scale data table with one row for every order that has occurred on your website. From this database, you can create an aggregate table with your daily sales totals. If your website receives 1,000 orders every day, your daily aggregate table would represent each day with 999 fewer rows than your original table. You can create another aggregate table with monthly sales totals that will be even more efficient. So now, if a user runs a query for daily or weekly sales, Looker will use the daily sales total table. If a user runs a query about yearly sales and you don’t have a yearly aggregate table, Looker will use the next best thing, which is the monthly sales aggregate table in this example.
This image shows how Looker answers your users’ questions with aggregate tables whenever possible:
- For a query about total monthly sales, Looker uses the aggregate table based on monthly sales (
- For a query about the total of each sale in a day, there is no aggregate table with that granularity, so Looker gets query results from the original database table (
orders_database). (However, if your users run this type of query often, you could easily create an aggregate table for it.)
- For a query about weekly sales, there is no weekly aggregate table, so Looker uses the next best thing, which is the aggregate table based on daily sales (
Using aggregate awareness logic, Looker will query the smallest aggregate table possible to answer your users’ questions. The original table would be used only for queries requiring more granularity than the aggregate tables can provide.
Aggregate tables do not need to be joined in or added to a separate Explore. Instead, Looker dynamically adjusts the FROM clause of the Explore query to access the best aggregate table for the query. This means that your drills are maintained and Explores can be consolidated. With aggregate awareness, one Explore can automatically leverage aggregate tables but still dive deep into granular data if needed.
You can also leverage aggregate tables to drastically improve the performance of dashboards, especially for tiles that query huge data sets. For details, see the Getting Aggregate Table LookML from a Dashboard section on the
aggregate_table documentation page.
Adding Aggregate Tables to Your Project
Looker developers can create strategic aggregate tables that will minimize the number of queries required on the large tables in a database.
An aggregate table is defined using the
aggregate_table parameter under an
explore parameter in your LookML project. The aggregate table query results are then saved to your database, much like a persistent derived table (PDT).
Here is an example of an aggregate table in LookML:
To create an aggregate table, you can write the LookML from scratch, or you can get aggregate table LookML from an Explore or from a dashboard. See the
aggregate_table parameter page for the specifics of the
aggregate_table parameter and its subparameters.
Designing Aggregate Tables
Looker can use an aggregate table for an Explore query if all of the following are true:
- The Explore query’s dimensions are a subset of the aggregate table’s dimensions. Or, for timeframes, the Explore query’s timeframes can be derived from the timeframes in the aggregate table (see the Timeframe Considerations section below).
- The Explore query’s measures are a subset of the aggregate table’s measures.
- The Explore query contains measure types supported by aggregate awareness (see the Measure Type Considerations section below).
- The Explore query’s filters reference fields that are available as dimensions in the aggregate table, or each of the Explore query’s filters matches a filter in the aggregate table (see the Filter Considerations section below).
- The Explore query’s time zone matches the time zone used by the aggregate table (see the Time Zone Considerations section below).
The following sections provide additional tips for ensuring that your aggregate tables are used for queries:
- Timeframe Considerations
- Time Zone Considerations
- Filter Considerations
- Field Considerations
- Measure Type Considerations
Looker’s aggregate awareness logic is able to derive one timeframe from another. For example, an aggregate table based on daily data can be used for an Explore query that calls for other timeframes, such as queries for daily, monthly, and yearly data, or even day-of-month, day-of-year, and week-of-year data. An aggregate table can be used for a query as long as the aggregate table’s timeframe has a higher granularity than the Explore query (a yearly aggregate table can’t be used for an Explore query for hourly data).
Looker’s aggregate awareness logic is able to derive one timeframe from another. An aggregate table can be used for a query as long as the aggregate table’s timeframe has a higher granularity than the Explore query. For example, an aggregate table based on daily data can be used for an Explore query that calls for other timeframes, such as queries for daily, monthly, and yearly data, or even day-of-month, day-of-year, and week-of-year data. But a yearly aggregate table can’t be used for an Explore query that calls for hourly data, since the aggregate table’s data doesn’t have enough granularity for the Explore query.
The same applies to time range subsets. For example, if you have an aggregate table that is filtered for the last three months and a user queries the data with a filter for the last two months, Looker will be able to use the aggregate table for that query.
Time Zone Considerations
In many cases, database admins use UTC as the time zone for databases. However, many users might not be in the UTC time zone. Looker has multiple options for converting time zones so that your users will get query results in their own time zone:
- Query time zone, a setting that applies to all queries on the database connection. If all your users are in the same time zone, you can set a single query time zone so that all queries are converted from the database time zone into the query time zone.
- User-specific time zones, where users can be assigned and select time zones individually. In this case, queries are converted from the database time zone into the individual user’s time zone.
See the Using Time Zone Settings documentation page for more information on these options.
These concepts are important for understanding aggregate awareness because, in order for an aggregate table to be used for a query with date dimensions or date filters, the time zone on the aggregate table must match the time zone setting used for the original query.
Aggregate tables use the database time zone if no
timezone value is specified. Your database connection will also use the database time zone if any of the following are true:
- Your database does not support time zones.
- Your database connection’s query time zone is set to the same time zone as the database time zone.
- Your database connection has neither a specified query time zone nor user-specific time zones. If this is the case, your database connection will use the database time zone.
If any of these are true, you can omit the
timezone parameter for your aggregate tables.
Otherwise, the aggregate table time zone should be defined to match possible queries so that the aggregate table is more likely to be used:
- If your database connection uses a single query time zone, you should match your aggregate table’s
timezonevalue to the query time zone value.
- If your database connection uses user-specific time zones, you should create identical aggregate tables, each with a different
timezonevalue to match the possible time zones of your users.
Be careful when including filters in your aggregate table. Filters on an aggregate table can narrow the results to the point where the aggregate table cannot be used. For example, say that you create an aggregate table for daily order counts, and the aggregate table filters for just sunglass orders coming from Australia. If a user runs an Explore query for daily order counts of sunglasses worldwide, Looker cannot use the aggregate table for that Explore query, since the aggregate table only has the data for Australia. The aggregate table filters the data too narrowly to be used by the Explore query.
Also, be conscious of the filters that your Looker developers might have built into your Explore, such as:
access_filters: Applies user-specific data restrictions.
always_filter: Require users to include a certain set of filters for an Explore query. Users may change the default filter value for their query, but they cannot remove the filter entirely.
conditionally_filter: Defines a set of default filters that users can override if they apply at least one filter from a second list that’s also defined in the Explore.
These filter types are based on specific fields. If your Explore has these filters, you must include their fields in the
dimensions parameter of the
For example, here is an Explore with an access filter that is based on the
To create an aggregate table that would be used for this Explore, the aggregate table must include the field on which the access filter is based. In the example below, the access filter is based on the field
orders.region, and this same field is included as a dimension in the aggregate table:
Because the aggregate table query includes the
orders.region dimension, Looker can dynamically filter the data in the aggregate table to match the filter from the Explore query. Therefore, Looker can still use the aggregate table for the Explore’s queries, even though the Explore has an access filter.
To be used for an Explore query, an aggregate table must have all the dimensions and measures needed for that Explore query, including the fields used for filters in the Explore query. If an Explore query contains a dimension or measure that is not in an aggregate table, Looker can’t use the aggregate table and will use the base table instead.
For example, if a query groups by dimensions A and B, aggregates by measure C, and filters on dimension D, then the aggregate table must minimally have A, B, and D as dimensions and C as a measure. The aggregate table can have other fields as well, but it must have at least these fields in order to be viable for optimization. The one exception is timeframe dimensions, since lower granularity timeframes can be derived from higher granularity ones.
Because of these field considerations, an aggregate table is specific to the Explore under which it’s defined. An aggregate table defined under one Explore won’t be used for queries on a different Explore.
Measure Type Considerations
Aggregate awareness can be used for Explore queries that use additive measures, which are these measure types:
If an Explore query uses any other type of measure, Looker will use the original table, not the aggregate table, to return results.
The only exception is if the Explore query is an exact match of an aggregate table query. In that case, Looker can use an aggregate table that includes non-additive measure types.
This makes sense because you can’t get accurate data if you try to aggregate aggregates. For example, if you are counting the distinct users on a website, there may be a user that came to the website twice, three weeks apart. If you tried to apply a weekly aggregate table to get a monthly count of distinct users on your website, that user would be counted twice in your monthly distinct count query, and the data would be incorrect. However, if the Explore query is an exact match of an aggregate table query, you can use distinct counts. So you could use your weekly distinct aggregate table to answer an Explore query about weekly distinct users.
Symmetric Aggregates for Explores with Joins
One important thing to note is that in an Explore that joins multiple database tables, Looker can render an additive measure as non-additive. For example, a
count measure is rendered as a
count_distinct measure type. This is to avoid fanout miscalculations for joins, and it is part of Looker’s symmetric aggregates functionality. See the Help Center article on symmetric aggregates for an explanation of this feature of Looker.
The symmetric aggregates functionality prevents miscalculations, but it can also prevent your aggregate tables from being used in certain cases, so it is important to understand.
For the measure types supported by aggregate awareness, this applies to
average. Looker will render these types of measures as non-additive if:
- The measure is from the “one” view of a many-to-one or one-to-many join.
- The measure is from either view of a many-to-many join.
relationship parameter documentation page for an explanation of these types of joins.
If you find that your aggregate table isn’t being used for this reason, you can create an aggregate table to exactly match an Explore query in order to use these measure types for an Explore with joins.
Determining Which Aggregate Table Is Used for a Query
If you are in Development Mode, you can run queries on the Explore to test if Looker will use your aggregate tables to get the results.
For example, based on the example monthly aggregate table above, we can go to the Explore and run a query for yearly sales totals. Then we can click on the SQL tab to see the details of the query that Looker created. If you are in Development Mode, Looker shows comments to indicate the aggregate table it used for the query:
From the comments on the SQL tab, we can see that Looker is using the
sales_monthly aggregate table for this query:
Looker also provides information about why other aggregate tables weren’t used for the query. In this case, the
sales_monthly aggregate table was more optimal than the
sales_daily aggregate tables.
See the Troubleshooting section below for possible comments you may see in the SQL tab and suggestions for how to resolve them.
Looker Unions New Data to Your Aggregate Tables
For aggregate tables with time filters, Looker can union fresh data into your aggregate table. You might have an aggregate table that includes data for the past three days, but that aggregate table might have been built yesterday. The aggregate table would be missing today’s information, so you wouldn’t expect to use it for an Explore query on the most recently daily information.
However, Looker still can use the data in that aggregate table for the query because Looker will run a query on the most recent data, then union those results into the results in the aggregate table.
Looker can union fresh data with your aggregate table’s data given the following circumstances:
- The aggregate table has a time filter.
- The aggregate table includes a dimension based on the same time field as the time filter.
For example, the following aggregate table has a dimension based on the
orders.created_date field, and has a time filter (
"3 days") based on the same field:
If this aggregate table was built yesterday, Looker will retrieve the most recent data that is not yet included in the aggregate table, then union the fresh results with the results from the aggregate table. This means your users will get the latest data while still optimizing performance using aggregate awareness.
If you’re in Development Mode, you can click the SQL tab of an Explore to see the aggregate table that Looker used for the query, and the
UNION statement that Looker used to bring in newer data that wasn’t included in the aggregate table.
Currently, if Looker is unable to union fresh data with your aggregate table, Looker will use the existing data from the aggregate table.
As described in the Determining Which Aggregate Table Is Used for a Query section, if you’re in Development Mode, you can run queries on the Explore and click the SQL tab to see comments about the aggregate table used for the query, if any.
The SQL tab also includes comments about why aggregate tables were not used for a query, if that is the case. For aggregate tables that aren’t used, the comment will begin with:
Did not use [explore name]::[aggregate table name];
For example, here is a comment about why the
sales_daily aggregate table defined in the
order_items Explore wasn’t used for a query:
In this case, the filters in the query prevented the aggregate table from being used.
The following table shows some other possible reasons that an aggregate table can’t be used, along with steps you can take to increase the usability of the aggregate table.
|Reason for Not Using
the Aggregate Table
|Explanation and Possible Steps|
|No such field in the Explore.||There is a LookML validation type error. This is mostly likely because the aggregate table was not properly defined, or there was a typo in the LookML for your aggregate table. A likely culprit is an incorrect field name, or the like.
To resolve this, verify that the dimensions and measures in the aggregate table match the field names in the Explore. See the
|The aggregate table does not include the following fields in the query.||To be used for an Explore query, an aggregate table must have all the dimensions and measures needed for that Explore query, including the fields used for filters in the Explore query. If an Explore query contains a dimension or measure that is not in an aggregate table, Looker can’t use the aggregate table and will use the base table instead. See the Field Considerations section above for details. The one exception is timeframe dimensions, since lower granularity timeframes can be derived from higher granularity ones.
To resolve this, verify that the Explore query’s fields are included in the aggregate table definition.
|The query contained the following filters that were neither included as fields nor exactly matched by filters in the aggregate table.||The filters in the Explore query prevent Looker from using the aggregate table.
To resolve this, you can do one of the following:
|The query contains the following measures that cannot roll up.||The query contains one or more measure types that aren’t supported for aggregate awareness, such as distinct count, median, or percentile.
To resolve this, check the type of each measure in the query and make sure they are supported measure types. Also, if your Explore has joins, verify that your measures aren’t converted to distinct measures (symmetric aggregates) through fanned out joins. See the Symmetric Aggregates for Explores with Joins section above for an explanation.
|A different aggregate table was a better fit for optimization.||There were multiple viable aggregate tables for the query and Looker found a more optimal aggregate table to use instead. Nothing needs to be done in this case.|
|Looker did not do any grouping (because of a
||The query references a dimension that prevents it from having a
To resolve this, verify that the view’s
|The aggregate table contained filters not in the query.||The aggregate table has a non-time filter that is not in the query.
To resolve this, you can remove the filter from the aggregate table. See the Filter Considerations section above for details.
|The optimizer cannot determine why the aggregate table wasn’t used.||This comment is reserved for corner cases. If you see this for an Explore query that is used often, you can create an aggregate table that exactly matches the Explore query. You can easily get aggregate table LookML from an Explore, as described on the
Dialect Support for Aggregate Awareness
The ability to use aggregate awareness depends on the database dialect your Looker connection is using. In the most current Looker release, the following dialects support aggregate awareness: