User Guide Getting Started Help Center Documentation Community Training
New LookML
Old LookML
New LookML
Looker
Using Derived Tables

Derived tables are important tools in Looker that enable you to expand the sophistication of your analyses. In some cases, they can also play a valuable role in enhancing query performance.

At a high level, Looker’s derived table functionality provides a way to create new tables that don’t already exist in your database.

You define a derived table in one of these ways:

You can then base a LookML view on the derived table, just like any other table.

Simple Example

Let’s consider an example to clarify the concept. Suppose your database already contains a table called order, and you’d like to summarize some of that order data by customer. We can create a new derived table named customer_order_facts to do this:

Here’s the LookML to create the customer_order_facts derived table as an NDT and as a SQL-based derived table:

Native Derived Table
view: customer_order_facts { derived_table: { explore_source: orders { column: customer_id { field: order.customer_id } column: first_order { field: order.first_order } column: lifetime_amount { field: order.lifetime_amount } } } dimension: customer_id { type: number primary_key: yes sql: ${TABLE}.customer_id ;; } dimension_group: first_order { type: time timeframes: [date, week, month] sql: ${TABLE}.first_order_date ;; } dimension: lifetime_amount { type: number value_format: "0.00" sql: ${TABLE}.lifetime_amount ;; } }
SQL-based Derived Table
view: customer_order_facts { derived_table: { sql: SELECT customer_id, MIN(DATE(time)) AS first_order_date, SUM(amount) AS lifetime_amount FROM order GROUP BY customer_id ;; } dimension: customer_id { type: number primary_key: yes sql: ${TABLE}.customer_id ;; } dimension_group: first_order { type: time timeframes: [date, week, month] sql: ${TABLE}.first_order_date ;; } dimension: lifetime_amount { type: number value_format: "0.00" sql: ${TABLE}.lifetime_amount ;; } }

There are some things to note:

Temporary and Persistent Derived Tables

Derived tables can be temporary or they can be stored in your database (called a persistent derived table).

Regular or “Ephemeral” Derived Table

A regular derived table—sometimes called an “ephemeral” derived table—is temporary and not written to your database. When a user runs an explore query involving one or more derived tables, Looker constructs a SQL query using a dialect-specific combination of the SQL for the derived table(s) plus the requested fields, joins, and filter values. If the combination has been run before and the results are still valid in cache, Looker uses those cached results.

Because an ephemeral derived table runs a new query on your database every time a user requests data from it, you’ll want to be sure that the derived table is performant and won’t put excessive strain on your database. In cases where the query will take some time to run, a persistent derived table is often a better option.

Persistent Derived Table

A persistent derived table—sometimes called a “PDT”—is written into a scratch schema on your database and regenerated on a schedule of your choosing. When the user requests data from it the table has already been created (in most cases), reducing query time and database load.

To use this feature you do need to create a scratch schema on your database, and your database needs to support writes. Most companies set this up when they initially configure Looker (see this page for instructions for your database dialect), and of course you can also do so after your initial setup. However, there are some read-only database configurations that don’t allow persistence to work (most commonly Postgres hot-swap slaves). In these cases, instead you can use ephemeral derived tables.

Adding Persistence

If your database allows writes and you have created a scratch schema (see this page for instructions for your database dialect), you can store derived tables in your database to increase query speed and reduce database load. Many users who can use persistence always use persistence, as typically the additional storage space required is not an issue.

Making a DT Persistent

There is no parameter that means “make this derived table persistent.” Rather, persistence is created by the addition of the datagroup_trigger, sql_trigger_value, or persist_for parameter.

Datagroups

If you have defined a datagroup for your caching policies, you can use the datagroup_trigger parameter to cause the persistent derived table (PDT) to rebuild on the same schedule as your caching policies. This means that users will not have to wait for the PDT to be created.

Datagroups are the most flexible method of creating persistence. You can define a datagroup to trigger a PDT rebuild at the time of an ETL rebuild, for example, or you can trigger the datagroup separately from your database ETL cycle.

See this page for an overview of caching and rebuilding PDTs in Looker.

sql_trigger_value

You can also specify a schedule on which the PDT will rebuild. This means that users will not have to wait for the PDT to be created. If the user happens to request data from the PDT while it is being rebuilt, data will be returned from the existing table until the new table is ready.

You achieve this kind of persistence by using the sql_trigger_value parameter.

By default, Looker checks to see if it should regenerate a table every 5 minutes. However, you can change this schedule as desired by using the PDT And Datagroup Maintenance Schedule setting in Looker’s admin settings.

persist_for

Another option is to set the length of time the derived table should be stored before it is dropped. If a user queries the table before that length of time has been reached, they’ll receive data from the existing table. Otherwise, it will be rebuilt the next time a user requests data from it, meaning that particular user will need to wait for the rebuild. You achieve this kind of persistence by using the persist_for parameter.

Indexing PDTs

Because PDTs are actually stored in your database, you should specify indexes using the indexes parameter for SQL dialects that support them. Or, if you’re using Redshift, you can specify regular sort keys (using sortkeys), interleaved sort keys (using indexes), and distribution keys (using distribution).

To add persistence to our original derived table example, we could set it to rebuild when the order_datagroup datagroup triggers, and add indexes on both customer_id and first_order_date, like this:

view: customer_order_facts { derived_table: { explore_source: order { … } datagroup_trigger: order_datagroup indexes: ["customer_id", "first_order_date"] } }

If you don’t add an index (or Redshift equivalent), Looker will warn you that you should do so to improve query performance.

If you are going to use persistence, you should be mindful of the underlying query’s duration and start time, because it will have an impact on how PDTs are generated. These are the things to keep in mind:

The takeaway is that long-running PDTs can slow or delay the creation of other PDTs. They can also slow down other queries for users while the database is working hard to generate the large PDT.

Persistence in Development Mode

When you add persistence to a derived table, there are some development mode behaviors you should know about.

Development Mode Behavior

If you query a PDT in development mode without making any changes to its definition, Looker will query the production version of that PDT. However, as soon as you make any change to the PDT definition and query it, a new development version of the PDT is created. This enables you to test changes without disturbing end users.

If a development version of a PDT is created, it will always be persisted for a maximum of 24 hours, regardless of the persistence method you’ve used. This ensures that development mode tables are cleaned up on a frequent basis and don’t clutter your database.

After you push changes to production, Looker will immediately begin treating your development table as the production table (as long as you haven’t used the conditional SQL described below). This means that users will not have to wait for a new version of the production table to be built.

Table Sharing and Cleanup

Within any given Looker instance, Looker will share PDTs between users if the PDTs have the same definition and have the same persistence method setting. Additionally, if a PDT’s definition ever ceases to exist, Looker drops the table.

This has several benefits:

Working Faster in Development Mode

Finally, there are cases when the PDT you’re creating may take a long time to generate, which can be frustrating when testing lots of changes in development mode. For SQL-based derived tables, Looker supports a conditional WHERE clause for development mode that helps manage this:

view: my_view { derived_table: { sql: SELECT columns FROM my_table WHERE -- if prod -- date > '2000-01-01' -- if dev -- date > '2015-01-01' ;; } }

In this example, the query will include all data from 2000 onward when in production, but only the data from 2015 onward when in development mode. Using this feature in clever ways to limit your result set, and increase query speed, can make development mode changes much easier to validate.

If you use this feature, please keep in mind that tables will never be shared between production mode and development mode. This will use more storage on your database, and it means that the production mode table will need to rebuild after you commit your changes to production.

Native derived tables do not have the support for a conditional WHERE clause, but you can create filters in your NDTs, or limit the NDT rows to help test and develop your NDTs.

Users Can Override Persistence

Some users can override the persistence settings you create if they need to get the most up-to-date data. These users can force every PDT referenced in the query to rebuild by clicking the query’s gear menu and choosing Rebuild Derived Tables and Run. A user with at least one role including the develop permission always has the Rebuild Derived Tables and Run choice available:

This feature enables a user to query up-to-date data when they need it. While that specific user is waiting for the tables to rebuild, other users’ queries will still use the existing tables. Therefore, at least in that sense, other users won’t be negatively impacted. However, other users could be affected by the additional load on your database. If you are in a situation where triggering a PDT rebuild during business hours could put an unacceptable strain on your database, you may need to communicate to your users that they should never rebuild certain PDTs.

Referencing Derived Tables in Other Derived Tables

It is possible to reference one derived table in the definition of another, which is sometimes called “cascading derived tables.” To use them, you’ll need to use the syntax ${derived_table_or_view_name.SQL_TABLE_NAME}. For example, you can reference the clean_events derived table by typing ${clean_events.SQL_TABLE_NAME}. You can also use the ${derived_table_or_view_name.SQL_TABLE_NAME} syntax to refer to a LookML view, or in the syntax of the sql_trigger_value parameter.

When referring to another derived table or view name, type SQL_TABLE_NAME as a literal string.

Although it’s not always required, when referring to a derived table in this manner, it’s often useful to alias the table. For example:

${derived_table_or_view_name.SQL_TABLE_NAME} AS derived_table_or_view_name

This is because, behind the scenes, persistent derived tables are named with lengthy codes in your database. In some cases (especially with ON clauses) it’s easy to forget that you need to use the ${derived_table_or_view_name.SQL_TABLE_NAME} syntax to retrieve this lengthy name. An alias can help to prevent this type of mistake.

In this example we’re creating a derived table from events that cleans out unwanted rows, and then creating a summary of that. The event_summary table regenerates whenever new rows are added to clean_events.

view: clean_events { derived_table: { sql: SELECT * FROM events WHERE type NOT IN ('test', 'staff') ;; sql_trigger_value: SELECT CURDATE() ;; } }   view: event_summary { derived_table: { sql: SELECT type, date, COUNT(*) AS num_events FROM ${clean_events.SQL_TABLE_NAME} AS clean_events GROUP BY type, date ;; sql_trigger_value: SELECT MAX(id) FROM ${clean_events.SQL_TABLE_NAME} ;; } }

This particular example could be done more efficiently in a single derived table, but it’s useful for demonstrating derived table references.

Monitoring and Troubleshooting Persistent Derived Tables

Companies that use many PDTs—especially if they are cascading—often want to understand the status of those PDTs. Looker offers several tools to investigate these tables, which are described on this page.

When attempting to troubleshoot persistent derived tables:

Supported Database Dialects for PDTs

Looker’s ability to provide PDTs depends on the database dialect’s functionality. The following list shows which dialects support PDTs in the most recent Looker release:

Top