New LookML
Old LookML
New LookML
Using Derived Tables

Derived tables are important tools in Looker that enable you to expand the sophistication of your analyses. In some cases, they can also play a valuable role in enhancing query performance.

At a high level, Looker’s derived table functionality provides a way to create new tables that don’t already exist in your database. These tables can be temporary and built at query time, or they can be stored in your database. Either way, you define a derived table by writing a SQL query, and the results of that query become the derived table. You can then reference the derived table in a LookML view, just like any other table.

Two Types of Derived Tables

There are two types of derived tables:

Regular or “Ephemeral” Derived Table

A regular derived table - sometimes called an “ephemeral” derived table - is temporary and not written to your database. Its query is run every time a user requests data from it. Behind the scenes, Looker achieves this behavior by using a common table expression or a temporary table. If you’ve written complex SQL queries by hand before, the way you use an ephemeral derived table is similar to the way you would use a subquery.

Because an ephemeral derived table runs a new query on your database every time a user requests data from it, you’ll want to be sure that the SQL you write is performant and won’t put excessive strain on your database. In cases where the query will take some time to run, a persistent derived table is often a better option.

Persistent Derived Table

A persistent derived table - sometimes called a “PDT” - is written into a scratch schema on your database and regenerated on a schedule of your choosing. When the user requests data from it the table has already been created (in most cases), reducing query time and database load.

In order to use this feature you do need to create a scratch schema on your database, and your database needs to support writes. Most companies set this up when they initially configure Looker (instructions on this page), and of course you can also do so after your initial setup. However, there are some read-only database configurations that don’t allow persistence to work (most commonly Postgres hot-swap slaves). In these cases you can fall back on ephemeral derived tables.

Simple Example

Let’s consider an example to clarify the concept. Suppose your database already contains a table called order, and you’d like to summarize some of that order data by customer. We can create a new derived table named customer_order_facts to do this. Your SQL transformation might look like this:

The LookML for this derived table would look like this:

view: customer_order_facts { derived_table: { sql: SELECT customer_id, MIN(DATE(time)) AS first_order_date, SUM(amount) AS lifetime_amount FROM order GROUP BY customer_id ;; } dimension: customer_id { type: number primary_key: yes sql: ${TABLE}.customer_id ;; } dimension_group: first_order { type: time timeframes: [date, week, month] sql: ${TABLE}.first_order_date ;; } dimension: lifetime_amount { type: number value_format: "0.00" sql: ${TABLE}.lifetime_amount ;; } }

There are a few things to take note of:

  1. We’ve used the derived_table parameter to turn the view into a derived table.
  2. The column names of your result set will be the names you need to reference in your dimensions, such as ${TABLE}.first_order_date. For this reason, you should make sure to alias each column with a clean name by using AS. This is why we wrote MIN(DATE(time)) AS first_order_date instead of simply MIN(DATE(time)).
  3. Other than a few derived table parameters (derived_table and sql) at the top of the view, this view works just like any other.

Adding Persistence

As noted above, as long as you have created a scratch schema to enable persistence (instructions on this page), you can store derived tables in your database to increase query speed and reduce database load. Many users who can use persistence always use persistence, as typically the additional storage space required is not an issue.

Types of Persistence

There is no parameter that means “make this derived table persistent”. Rather, the addition of either the sql_trigger_value or persist_for parameter automatically creates persistence.


The primary method you’ll use is to specify a schedule on which the persistent derived table (PDT) will rebuild. This means that users will not have to wait for the PDT to be created. If the user happens to request data from the PDT while it is being rebuilt, data will be returned from the existing table until the new table is ready.

You achieve this kind of persistence by using the sql_trigger_value parameter.

By default, Looker checks to see if it should regenerate a table every 5 minutes. However, you can change this schedule as desired by using the PDT And Datagroup Maintenance Schedule setting in Looker’s admin settings.


The other option is to set the length of time the derived table should be stored before it is dropped. If a user queries the table before that length of time has been reached, they’ll receive data from the existing table. Otherwise, it will be rebuilt the next time a user requests data from it, meaning that particular user will need to wait for the rebuild. You achieve this kind of persistence by using the persist_for parameter.

Query Performance and Persistence

Because PDTs are actually stored in your database, you should specify indexes - using the indexes parameter - for SQL dialects that support them. Or, if you’re using Redshift, you can specify regular sort keys (using sortkeys), interleaved sort keys (using indexes), and distribution keys (using distribution).

If we wanted to add persistence to our original example we could set it to rebuild every day at midnight, and add indexes on both customer_id and first_order_date, like this:

view: customer_order_facts { derived_table: { sql: … ;; sql_trigger_value: SELECT CURDATE() ;; indexes: ["customer_id", "first_order_date"] } }

If you don’t add an index (or Redshift equivalent), Looker will warn you that you should do so as a means to improve query performance.

If you are going to use persistence, you should be mindful of the underlying query’s duration and start time, because it will have an impact on how PDTs are generated. These are the things to keep in mind:

The takeaway is that long-running PDTs can slow or delay the creation of other PDTs. They can also slow down other queries for users while the database is working hard to generate the large PDT.

Persistence in Development Mode

When you add persistence to a derived table, there are some development mode behaviors you should know about.

Development Mode Behavior

If you query a PDT in development mode without making any changes to its SQL, Looker will query the production version of that PDT. However, as soon as you make any change to the SQL and query it, a new development version of the PDT is created. This enables you to test changes without disturbing end users.

If a development version of a PDT is created, it will always be persisted for a maximum of 24 hours, regardless of the sql_trigger_value or persist_for you’ve used. This ensures that development mode tables are cleaned up on a frequent basis and don’t clutter your database.

After you push changes to production, Looker will immediately begin treating your development table as the production table (as long as you haven’t used the conditional SQL described below). This means that users will not have to wait for a new version of the production table to be built.

Table Sharing and Cleanup

Within any given Looker instance, Looker will share PDTs between users if they have the same SQL, and have the same sql_trigger_value or persist_for setting. Additionally, if the SQL that defines a PDT ever ceases to exist, Looker drops the table.

This has several benefits:

Working Faster in Development Mode

Finally, there are cases when the PDT you’re creating may take a long time to generate, which can be frustrating when testing lots of changes in development mode. Looker supports a conditional WHERE clause for development mode that helps manage this:

view: my_view { derived_table: { sql: SELECT columns FROM my_table WHERE -- if prod -- date > '2000-01-01' -- if dev -- date > '2015-01-01' ;; } }

In this example, the query will include all data from 2000 onward when in production, but only the data from 2015 onward when in development mode. Using this feature in clever ways to limit your result set, and increase query speed, can make development mode changes much easier to validate.

If you use this feature, please keep in mind that tables will never be shared between production mode and development mode. This will use more storage on your database, and it means that the production mode table will need to rebuild after you commit your changes to production.

Users Can Override Persistence

Some users can override the persistence settings you create if they need to get the most up-to-date data. These users can force every PDT referenced in the query to rebuild by clicking the query’s gear menu and choosing Rebuild Derived Tables and Run. A user with at least one role including the develop permission always has the Rebuild Derived Tables and Run choice available:

This feature enables a user to query up-to-date data when they need it. While that specific user is waiting for the tables to rebuild, other users’ queries will still use the existing tables. Therefore, at least in that sense, other users won’t be negatively impacted. However, other users could be affected by the additional load on your database. If you are in a situation where triggering a PDT rebuild during business hours could put an unacceptable strain on your database, you may need to communicate to your users that they should never rebuild certain PDTs.

Referencing Derived Tables in Other Derived Tables

It is possible to reference a derived table in the SQL of another, which is sometimes called “cascading derived tables”. To use them, you’ll need to use the syntax ${derived_table_view_name.SQL_TABLE_NAME}. You can also use the ${derived_table_view_name.SQL_TABLE_NAME} syntax within the sql_trigger_value parameter.

The string SQL_TABLE_NAME is a literal string and should be written exactly; it’s not a variable that needs to be replaced by anything.

Although it’s not always required, it’s often useful to alias tables when you refer to them in this manner, for example:

${derived_table_view_name.SQL_TABLE_NAME} AS derived_table_view_name

This is because, behind the scenes, persistent derived tables are named with lengthy codes in your database. In some cases (especially ON clauses) it’s easy to forget that you need to use the ${derived_table_view_name.SQL_TABLE_NAME} syntax to retrieve this lengthy name. An alias can help to prevent this type of mistake.

In this example we’re creating a derived table from events that cleans out unwanted rows, and then creating a summary of that. The event_summary table regenerates whenever new rows are added to clean_events.

view: clean_events { derived_table: { sql: SELECT * FROM events WHERE type NOT IN ('test', 'staff') ;; sql_trigger_value: SELECT CURDATE() ;; } }   view: event_summary { derived_table: { sql: SELECT type, date, COUNT(*) AS num_events FROM ${clean_events.SQL_TABLE_NAME} AS clean_events GROUP BY type, date ;; sql_trigger_value: SELECT MAX(id) FROM ${clean_events.SQL_TABLE_NAME} ;; } }

This particular example could be done more efficiently in a single derived table, but it’s useful for demonstrating derived table references.

Monitoring and Troubleshooting Persistent Derived Tables

Companies that use many PDTs - especially if they are cascading - often want to understand the status of those PDTs. Looker offers several tools to investigate these tables, which are described on this page.

When attempting to troubleshoot persistent derived tables, please keep the following items in mind:

Supported Database Dialects for PDTs

Looker’s ability to provide PDTs depends on the database dialect’s functionality. The following list shows which dialects support PDTs in the most recent Looker release: