Usage
label: "desired label"
description: "description string"
max_cache_age: "24 hours"
sql_trigger: SELECT max(id) FROM my_tablename ;;
}
Hierarchydatagroup |
Default ValueNoneAcceptsAn identifier for your datagroup, plus sub-parameters defining your datagroup properties. |
Definition
Use datagroup
to assign a caching policy to Explores and/or PDTs. If you want different caching policies for different Explores and/or PDTs, then use a separate datagroup
parameter to specify each policy.
You can add a label and a description for the datagroup:
label
: Specifies an optional label for the datagroup. See thelabel
anddescription
section on this page for details.description
: Specifies an optional description for the datagroup that can be used to explain the datagroup’s purpose and mechanism. See thelabel
anddescription
section on this page for details.
Specify the details of the caching policy by using one or both of these datagroup
subparameters:
sql_trigger
: Specifies a SQL query that returns one row with one column. If the value returned by the query is different than the query’s prior results, then the datagroup goes into a triggered state. See thesql_trigger
section on this page for details.max_cache_age
: Specifies a string defining a time period. When the age of a query’s cache exceeds the time period, Looker invalidates the cache. The next time the query is issued, Looker sends the query to the database for fresh results. See themax_cache_age
section on this page for details.
Often the best solution is to use the two parameters in combination. You specify a sql_trigger
SQL query that will be triggered by your data load (ETL) into your database. In addition, you specify a max_cache_age
that will invalidate old data if your ETL fails. The max_cache_age
parameter ensures that if the cache for a datagroup isn’t cleared by the sql_trigger
, then the cache entries will expire by a certain time. So the failure mode for a datagroup will be to query the database rather than serve stale data from the Looker cache.
For connections using user attributes to specify the connection parameters, you must create a separate connection using the PDT override fields if you want to define a datagroup caching policy using a SQL query trigger.
Without the PDT overrides, you can still use a datagroup for the model and its Explores, as long as you define the datagroup’s caching policy using onlymax_cache_age
, notsql_trigger
.
sql_trigger
Use the sql_trigger
parameter to specify a SQL query that returns exactly one row with one column. Looker runs the SQL query at intervals specified in the PDT And Datagroup Maintenance Schedule field of the database connection. If the query returns a different value than the previous result, the datagroup goes into a triggered state. Once the datagroup is triggered, Looker rebuilds any PDTs with that datagroup specified in their datagroup_trigger
parameter. After the PDT rebuilding is complete, the datagroup goes into a ready state and Looker invalidates the cached results of any Explores using that datagroup.
Typically, sql_trigger
specifies a SQL query that indicates when a new data load (ETL) has occurred, for example by querying the max(ID)
in a table. You can also use sql_trigger
to specify a certain time of day by querying the current date and adding additional hours to that timestamp as needed to reach the time you want, for example 4 AM.
Looker does not perform time zone conversion for
sql_trigger
. If you want to trigger your datagroup at a specific time of day, set the trigger in the time zone that your database is configured for.
See these examples from the sql_trigger
parameter for ideas on setting up SQL queries to trigger a datagroup.
max_cache_age
max_cache_age
parameter specifies a string defining a time period. This time period is the maximum time period for the cached results to be used. In the max_cache_age
parameter you pass a string containing an integer followed by “seconds”, “minutes”, or “hours”.
When the age of a query’s cache exceeds the max_cache_age
, Looker invalidates the cache. The next time the query is issued, Looker sends the query to the database for fresh results.
Also, when the max_cache_age
time expires, data is deleted from the cache, as long as the Instant Dashboards Looker Labs feature is disabled. See the Caching queries and rebuilding PDTs with datagroups documentation page for information on how long data is stored in the cache.
max_cache_age
only invalidates the cache, it does not cause PDTs to rebuild. Use persist_for
with a derived table to set a maximum time period to rebuild a PDT. If you define a datagroup with only max_cache_age
, a derived table assigned that datagroup will persist, but because there is no trigger assigned, the PDT will sit in the scratch schema indefinitely.
label
and description
You can use the optional label
and description
subparameters to add a customized label and a description of the datagroup. You can also localize these subparameters using locale strings files.
These subparameters are displayed on the Datagroups page in the Database section of the Admin panel. See the Admin Options - Datagroups documentation page for more information on how these are displayed.
Examples
To create a caching policy that retrieves new results whenever there’s new data available or at least every 24 hours, do the following:
- Use the
orders_datagroup
datagroup (in the model file) to name the caching policy. - Use the
sql_trigger
parameter to specify the query that indicates that there is fresh data:select max(id) from my_tablename
. Whenever the data has been updated, this query returns a new number. - Use the
max_cache_age
setting to invalidate the data if it has been cached for 24 hours. - Use the optional
label
anddescription
parameters to add a customized label and a description of the datagroup.
To use the orders_datagroup
caching policy as the default for Explores in a model, use the persist_with
parameter at the model level, and specify the orders_datagroup
:
To use the orders_datagroup
caching policy for a specific Explore, add the persist_with
parameter under the explore
parameter, and specify the orders_datagroup
. If there is a default datagroup specified at the model level, you can use the persist_with
parameter under an explore
to override the default setting.
To use the orders_datagroup
datagroup caching policy for rebuilding a PDT, you can add datagroup_trigger
under the derived_table
parameter, and specify the orders_datagroup
: