home User Guide Getting Started Help Center Documentation Community Training Certification
menu
close
settings
Looker keyboard_arrow_down
language keyboard_arrow_down
English
Français
Deutsch
日本語
search
print
Looker documentation will be moving to cloud.google.com in mid-2022!
All the information you rely on will be migrated and all docs.looker.com URLs will be redirected to the appropriate page.
datagroup

Usage

datagroup: datagroup_name {
  max_cache_age: "24 hours"
  sql_trigger: SELECT max(id) FROM my_tablename ;;
  interval_trigger: "12 hours"
  label: "desired label"
  description: "description string"
}

Hierarchy

datagroup

Default Value

None

Accepts

An identifier for your datagroup, plus sub-parameters defining your datagroup properties.

Definition

Use datagroup to assign a caching policy to Explores and/or PDTs. If you want different caching policies for different Explores and/or PDTs, then use a separate datagroup parameter to specify each policy.

You can add a label and a description for the datagroup:

Specify the details of the caching policy by using the datagroup subparameters:

Often the best solution is to use max_cache_age in combination with either sql_trigger or interval_trigger. Specify either a sql_trigger or an interval_trigger value that matches the data load (ETL) into your database, then specify a max_cache_age value that will invalidate old data if your ETL fails. The max_cache_age parameter ensures that if the cache for a datagroup isn’t cleared by sql_trigger or interval_trigger, then the cache entries will expire by a certain time. That way, the failure mode for a datagroup will be to query the database rather than serve stale data from the Looker cache.

A datagroup cannot have both sql_trigger and interval_trigger parameters. If you define a datagroup with both parameters, the datagroup will use the interval_trigger value and ignore the sql_trigger value, since the sql_trigger parameter requires using database resources when querying the database.

For connections using user attributes to specify the connection parameters, you must create a separate connection using the PDT override fields if you want to define a datagroup caching policy using a SQL query trigger.

Without the PDT overrides, you can still use a datagroup for the model and its Explores, as long as you define the datagroup’s caching policy using only max_cache_age, not sql_trigger.

max_cache_age

The max_cache_age parameter specifies a string containing an integer followed by “seconds”, “minutes”, or “hours”. This time period is the maximum time period for the cached results to be used by Explore queries that use the datagroup.

When the age of a query’s cache exceeds the max_cache_age, Looker invalidates the cache. The next time the query is issued, Looker sends the query to the database for fresh results. When the age of a query’s cache exceeds the max_cache_age, Looker invalidates the cache. The next time the query is issued, Looker sends the query to the database for fresh results. See the Caching queries and rebuilding PDTs with datagroups documentation page for information on how long data is stored in the cache.

If the Instant Dashboards Looker Labs feature is enabled, queries that are run from a dashboard will always run against the database; the previous runs data will be displayed on the dashboard until the query results are returned, regardless of the max_cache_age value.

The max_cache_age parameter defines only when the cache is invalidated; it does not trigger the rebuilding of PDTs. If you define a datagroup with only max_cache_age, you will get a LookML validation warning if any derived tables are assigned to the datagroup. If you leave a derived table assigned to a datagroup with only a max_cache_age parameter, the derived table will be built when the table is first queried, but the derived table will sit in the scratch schema indefinitely and never rebuild, even if it is queried again. If your intention is to have a PDT rebuild at a specific time interval, you should add a interval_trigger parameter to your datagroup to define a PDT rebuild schedule.

sql_trigger

Use the sql_trigger parameter to specify a SQL query that returns exactly one row with one column. Looker runs the SQL query at intervals specified in the PDT And Datagroup Maintenance Schedule field of the database connection. If the query returns a different value than the previous result, the datagroup goes into a triggered state. Once the datagroup is triggered, Looker rebuilds any PDTs with that datagroup specified in their datagroup_trigger parameter. After the PDT rebuilding is complete, the datagroup goes into a ready state and Looker invalidates the cached results of any Explores using that datagroup.

Typically, sql_trigger specifies a SQL query that indicates when a new data load (ETL) has occurred, for example by querying the max(ID) in a table. You can also use sql_trigger to specify a certain time of day by querying the current date and adding additional hours to that timestamp as needed to reach the time you want, for example 4 AM.

Looker does not perform time zone conversion for sql_trigger. If you want to trigger your datagroup at a specific time of day, set the trigger in the time zone that your database is configured for.

See these examples from the sql_trigger parameter for ideas on setting up SQL queries to trigger a datagroup.

interval_trigger

You can use the optional interval_trigger subparameter to specify a time duration for rebuilding. In the interval_trigger parameter you pass a string containing an integer followed by “seconds”, “minutes”, or “hours”.

label and description

You can use the optional label and description subparameters to add a customized label and a description of the datagroup. You can also localize these subparameters using locale strings files.

These subparameters are displayed on the Datagroups page in the Database section of the Admin panel. See the Admin options - Datagroups documentation page for more information on how these are displayed.

Examples

To create a caching policy that retrieves new results whenever there’s new data available or at least every 24 hours, do the following:

datagroup: orders_datagroup { sql_trigger: SELECT max(id) FROM my_tablename ;; max_cache_age: "24 hours" label: "ETL ID added" description: "Triggered when new ID is added to ETL log" }

To use the orders_datagroup caching policy as the default for Explores in a model, use the persist_with parameter at the model level, and specify the orders_datagroup:

persist_with: orders_datagroup

To use the orders_datagroup caching policy for a specific Explore, add the persist_with parameter under the explore parameter, and specify the orders_datagroup. If there is a default datagroup specified at the model level, you can use the persist_with parameter under an explore to override the default setting.

explore: customer_facts { persist_with: orders_datagroup … }

To use the orders_datagroup datagroup caching policy for rebuilding a PDT, you can add datagroup_trigger under the derived_table parameter, and specify the orders_datagroup:

view: customer_order_facts { derived_table: { datagroup_trigger: orders_datagroup … } }

Top