max_cache_age: "24 hours"
sql_trigger: SELECT max(id) FROM my_tablename ;;
AcceptsAn identifier for your datagroup, plus sub-parameters defining your datagroup properties.
datagroup to assign a caching policy to Explores and/or PDTs. If you want different caching policies for different Explores and/or PDTs, then use a separate
datagroup parameter to specify each policy.
Specify the details of the caching policy by using one or both of these
sql_trigger: Specifies a SQL query that returns one row with one column. If the value returned by the query is different than the query’s prior results, then the datagroup goes into a triggered state. See this section for the details.
max_cache_age: Specifies a string defining a time period. When the age of a query’s cache exceeds the time period, Looker invalidates the cache. The next time the query is issued, Looker sends the query to the database for fresh results. See this section for the details.
Often the best solution is to use the two parameters in combination. You specify a
sql_trigger SQL query that will be triggered by your data load (ETL) into your database. In addition, you specify a
max_cache_age that will invalidate old data if your ETL fails. The
max_cache_age parameter ensures that if the cache for a datagroup isn’t cleared by the
sql_trigger, then the cache entries will expire by a certain time. So the failure mode for a datagroup will be to query the database rather than serve stale data from the Looker cache.
For connections using user attributes to specify the connection parameters, you must create a separate connection using the PDT override fields if you want to define a datagroup caching policy using a SQL query trigger.
Without the PDT overrides, you can still use a datagroup for the model and its Explores, as long as you define the datagroup’s caching policy using only
sql_trigger parameter to specify a SQL query that returns exactly one row with one column. Looker runs the SQL query at intervals specified in the PDT And Datagroup Maintenance Schedule field of the database connection. If the query returns a different value than the previous result, the datagroup goes into a triggered state. Once the datagroup is triggered, Looker rebuilds any PDTs with that datagroup specified in their
datagroup_trigger parameter. After the PDT rebuilding is complete, the datagroup goes into a ready state and Looker invalidates the cached results of any Explores using that datagroup.
sql_trigger specifies a SQL query that indicates when a new data load (ETL) has occurred, for example by querying the max(ID) in a table. You can also use
sql_trigger to specify a certain time of day by querying the current date and adding additional hours to that timestamp as needed to reach the time you want, for example 4 AM.
Looker does not perform time zone conversion for
sql_trigger. If you want to trigger your datagroup at a specific time of day, set the trigger in the time zone that your database is configured for.
See these examples from the
sql_trigger parameter for ideas on setting up SQL queries to trigger a datagroup.
max_cache_age parameter specifies a string defining a time period. This time period is the maximum time period for the cached results to be used. In the
max_cache_age parameter you pass a string containing an integer followed by “seconds”, “minutes”, or “hours”.
When the age of a query’s cache exceeds the
max_cache_age, Looker invalidates the cache. The next time the query is issued, Looker sends the query to the database for fresh results.
Also, when the
max_cache_age time expires, data is deleted from the cache, as long as the Instant Dashboards Looker Labs feature is disabled. See here for information on how long data is stored in the cache.
max_cache_age only invalidates the cache, it does not cause PDTs to rebuild. Use
persist_for with a derived table to set a maximum time period to rebuild a PDT. If you define a datagroup with only
max_cache_age, a derived table assigned that datagroup will persist, but because there is no trigger assigned, the PDT will sit in the scratch schema indefinitely.
To create a caching policy that retrieves new results whenever there’s new data available or at least every 24 hours, do the following:
- Use the
orders_datagroupdatagroup (in the model file) to name the caching policy.
- Use the
sql_triggerparameter to specify the query that indicates that there is fresh data:
select max(id) from my_tablename. Whenever the data has been updated, this query returns a new number.
- Use the
max_cache_agesetting to invalidate the data if it has been cached for 24 hours.
To use the
orders_datagroup caching policy as the default for Explores in a model, use the
persist_with parameter at the model level, and specify the
To use the
orders_datagroup caching policy for a specific Explore, add the
persist_with parameter under the
explore parameter, and specify the
orders_datagroup. If there is a default datagroup specified at the model level, you can use the
persist_with parameter under an
explore to override the default setting.
To use the
orders_datagroup datagroup caching policy for rebuilding a PDT, you can add
datagroup_trigger under the
derived_table parameter, and specify the