User Guide Getting Started Help Center Documentation Community Training
Looker
  
English
Français
Deutsch
日本語
datagroup

Usage

datagroup: datagroup_name {
  max_cache_age: "24 hours"
  sql_trigger: SELECT max(id) FROM my_tablename ;;
}

Hierarchy

datagroup

Default Value

None

Accepts

An identifier for your datagroup, plus sub-parameters defining your datagroup properties.

Definition

Use datagroup to assign a caching policy to Explores and/or PDTs. If you want different caching policies for different Explores and/or PDTs, then use a separate datagroup parameter to specify each policy.

Specify the details of the caching policy by using one or both of these datagroup subparameters:

Often the best solution is to use the two parameters in combination. You specify a sql_trigger SQL query that will be triggered by your data load (ETL) into your database. In addition, you specify a max_cache_age that will invalidate old data if your ETL fails. The max_cache_age parameter ensures that if the cache for a datagroup isn’t cleared by the sql_trigger, then the cache entries will expire by a certain time. So the failure mode for a datagroup will be to query the database rather than serve stale data from the Looker cache.

For connections using user attributes to specify the connection parameters, you must create a separate connection using the PDT override fields if you want to define a datagroup caching policy using a SQL query trigger.
Without the PDT overrides, you can still use a datagroup for the model and its Explores, as long as you define the datagroup’s caching policy using only max_cache_age, not sql_trigger.

sql_trigger

Use the sql_trigger parameter to specify a SQL query that returns exactly one row with one column. Looker runs the SQL query at intervals specified in the PDT And Datagroup Maintenance Schedule field of the database connection. If the query returns a different value than the previous result, the datagroup goes into a triggered state. Once the datagroup is triggered, Looker rebuilds any PDTs with that datagroup specified in their datagroup_trigger parameter. After the PDT rebuilding is complete, the datagroup goes into a ready state and Looker invalidates the cached results of any Explores using that datagroup.

Typically, sql_trigger specifies a SQL query that indicates when a new data load (ETL) has occurred, for example by querying the max(ID) in a table. You can also use sql_trigger to specify a certain time of day by querying the current date and adding additional hours to that timestamp as needed to reach the time you want, for example 4 AM.

Looker does not perform time zone conversion for sql_trigger. If you want to trigger your datagroup at a specific time of day, set the trigger in the time zone that your database is configured for.

See these examples from the sql_trigger parameter for ideas on setting up SQL queries to trigger a datagroup.

max_cache_age

max_cache_age parameter specifies a string defining a time period. This time period is the maximum time period for the cached results to be used. In the max_cache_age parameter you pass a string containing an integer followed by “seconds”, “minutes”, or “hours”.

When the age of a query’s cache exceeds the max_cache_age, Looker invalidates the cache. The next time the query is issued, Looker sends the query to the database for fresh results.

Also, when the max_cache_age time expires, data is deleted from the cache, as long as the Instant Dashboards Looker Labs feature is disabled. See here for information on how long data is stored in the cache.

max_cache_age only invalidates the cache, it does not cause PDTs to rebuild. Use persist_for with a derived table to set a maximum time period to rebuild a PDT. If you define a datagroup with only max_cache_age, a derived table assigned that datagroup will persist, but because there is no trigger assigned, the PDT will sit in the scratch schema indefinitely.

Examples

To create a caching policy that retrieves new results whenever there’s new data available or at least every 24 hours, do the following:

datagroup: orders_datagroup { sql_trigger: SELECT max(id) FROM my_tablename ;; max_cache_age: "24 hours" }

To use the orders_datagroup caching policy as the default for Explores in a model, use the persist_with parameter at the model level, and specify the orders_datagroup:

persist_with: orders_datagroup

To use the orders_datagroup caching policy for a specific Explore, add the persist_with parameter under the explore parameter, and specify the orders_datagroup. If there is a default datagroup specified at the model level, you can use the persist_with parameter under an explore to override the default setting.

explore: customer_facts { persist_with: orders_datagroup … }

To use the orders_datagroup datagroup caching policy for rebuilding a PDT, you can add datagroup_trigger under the derived_table parameter, and specify the orders_datagroup:

view: customer_order_facts { derived_table: { datagroup_trigger: orders_datagroup … } }

Top