How Looker Generates SQL

LookML
Version

On this Page
Docs Menu

If you come to Looker from a SQL background, you’re probably curious how Looker generates SQL. Fundamentally, Looker is a tool that generates SQL queries and submits them against a database connection. Looker formulates SQL queries based on a LookML project that describes the relationship between tables and columns in the database. By understanding how Looker generates queries, you will better understand how your LookML code translates to efficient SQL queries.

Every LookML parameter controls some aspect of how Looker generates SQL, by altering the structure, content, or behavior of the query. This page describes the principles of how Looker generates SQL, but does not cover all LookML elements in detail. See the LookML Reference for complete details.

Viewing the Query

In a saved Look or an explore, you can use the SQL tab in the Data section to see what Looker sends to the database to get the data. You also can use the links at the bottom to view your query in SQL Runner or see the database’s explain plan for the query. For more on SQL Runner, see this page. For more information on optimizing a query using SQL Runner, see this article.

Canonical Form of a Looker Query

Looker’s SQL queries always take the following form.

SELECT
   <dimension>, <dimension>, ...
   <measure>, <measure>, ...
FROM <explore>
LEFT JOIN <view> ON ...
LEFT JOIN <view> ON ...
WHERE (<dimension_filter_expression>) AND (<dimension_filter_expression>) AND ...
GROUP BY <dimension>, <dimension>, <dimension>, ...
ORDER BY <dimension> | <measure>
HAVING <measure_filter_expression> AND <measure_filter_expression> AND ...
LIMIT <limit>

The LookML project defines all the dimensions, measures, explores, and views referenced in the formula above. Filter expressions are specified in the Looker app by the user to shape ad hoc queries. Filter expressions can also be declared directly in the LookML to apply to all queries.

Fundamental components of a Looker query

All Looker queries are represented by these fundamental parameters applied to a LookML project, as seen in the formula above. Looker uses these parameters to generate a complete SQL query.

These components are:

  • model - the name of the LookML model to target, which specifies the target database
  • explore - the name of the explore to query, which populates the SQL FROM clause
  • fields - the dimensions and measures to include in the query, which populates the SQL SELECT clause
  • filters - Looker filter expressions to apply to zero or more fields, which populate the SQL WHERE and HAVING clauses
  • sort order - the field to sort by, and the sort order, which populates the SQL ORDER BY clause

These parameters are precisely the elements that a user specifies when building a query on the Looker Explore page. These same elements show up in all modes of executing queries with Looker: in the generated SQL, in the URL that represents the query, the Looker API, and so on.

What about the views specified by the LEFT JOIN clauses? JOIN clauses are populated based on the structure of the LookML model, which specifies how views join to explores. When constructing SQL queries, Looker includes JOIN clauses only when required. When users building a query in Looker, they don’t have to specify how tables join together, because this information is encoded in the model— one of Looker’s most powerful benefits to business users.

An Example Query and the Resulting SQL

Let’s build a query in Looker to demonstrate how the query gets generated according to the pattern above. Consider an e-commerce store with tables to track users and orders. The fields and table relationships are shown below.

Let’s find the number of orders (ORDERS Count) grouped by state (USERS State) and filtered by creation date of the order (ORDERS Created Date).

Below is the query result in the Looker Explore page.

Clicking the SQL tab shows the SQL generated and executed by Looker.

Below is the raw text for clarity:

SELECT COALESCE(users.state,'') AS "_g1",
  users.state AS `users.state`,
  COUNT(DISTINCT orders.id) AS `orders.count`
FROM orders
LEFT JOIN users ON orders.user_id = users.id

WHERE
  orders.created_at BETWEEN (CONVERT_TZ(DATE_ADD(CURDATE(),INTERVAL -29 day),'America/Los_Angeles','UTC')) AND (CONVERT_TZ(DATE_ADD(DATE_ADD(DATE_ADD(CURDATE(),INTERVAL -29 day),INTERVAL 30 day),INTERVAL -1 second),'America/Los_Angeles','UTC'))
GROUP BY 1
ORDER BY COUNT(DISTINCT orders.id) DESC
LIMIT 500

Note the similarity to the canonical formula stated above. Looker’s SQL exhibits some traits of machine-generated code (e.g. COALESCE(users.state,'') AS "_g1"), but always fits the formula.

SELECT
   <dimension>,<dimension>,...
   <measure>,<measure>,...
FROM <explore>
LEFT JOIN <view> ON ...
LEFT JOIN <view> ON ...
WHERE (<dimension_filter_expression>) AND (<dimension_filter_expression>) AND ...
GROUP BY <dimension>,<dimension>,<dimension>,...
ORDER BY <dimension> | <measure>
HAVING <measure_filter_expression> AND <measure_filter_expression> AND ...
LIMIT <limit>

Experiment with more queries in Looker to prove to yourself that the query structure is always the same.

Running Raw SQL in Looker’s SQL Runner

Looker includes a feature called the SQL Runner where you can run any SQL you like against the database connections you’ve setup in Looker.

Since every query generated by Looker results in a complete, functional SQL command, you can use the SQL Runner to investigate or play with the query.

Raw SQL queries executed in SQL Runner produce the same result set

If the SQL contains any errors, SQL Runner will highlight the location of the first error in the SQL command, and will include the position of the error in the error message.

Note: If your connection settings specifies a database timezone, Looker will issue a SET TIMEZONE upon connection, which may affect results.

Examining Query Components in the URL

After running a query in Looker, examining the URL will reveal the five fundamental components described above. For example, the above simple query produces the following URL:

https://learn.looker.com/explore/ecommerce/orders
?show=data,fields&fields=users.state,orders.count
&sorts=orders.count+desc&f%5Borders.created_date%5D=30+days

The URL provides sufficient information to recreate the query:

model https://learn.looker.com/explore/ecommerce
explore /orders
fields to query and display ?show=data,fields&fields=users.state,orders.count
sort field and order &sorts=orders.count+desc
filter fields and values &f%5Borders.created_date%5D=30+days

How Looker Structures JOINs

In the query SQL above, notice that the explore appears in the main FROM clause and the joined views appear in the LEFT JOIN clauses. Looker joins can be written in many different ways, which is explained in more detail on the Working with Joins in LookML page.

SQL Blocks Specify Custom SQL Clauses

Not all elements of a Looker query are machine-generated. At some point the data model needs to provide specific detail for Looker to access the underlying tables and compute derived values. In LookML, SQL blocks are snippets of SQL code provided by the data modeler, which Looker uses to synthesize complete SQL expressions.

The most common SQL block parameter is sql, used in dimension and measure definitions. The sql parameter specifies a SQL clause to reference an underlying column or to perform an aggregate function. In general, all LookML parameters starting sql_ expect a SQL expression of some form. For example: sql_always_where, sql_on, and sql_table_name. The LookML Reference provides details on each parameter.

Example SQL Blocks for Dimensions and Measures

Below are a few examples of SQL blocks for dimensions and measures. The LookML substitution operator ($) makes these sql declarations appear deceptively unlike SQL. However, after substitution has occurred, the resulting string is pure SQL, which Looker injects into the SELECT clause of the query.

- dimension: id primary_key: true sql: ${TABLE}.id # Specify the primary key, id   - measure: average_cost type: average value_format: '0.00' sql: ${cost} # Specify the field that you want to average # The field ‘cost’ is declared elsewhere   - dimension: name sql: CONCAT(${first_name}, ' ', ${last_name})   - dimension: days_in_inventory type: number sql: DATEDIFF(${sold_date}, ${created_date})
dimension: id { primary_key: yes sql: ${TABLE}.id ;; # Specify the primary key, id } measure: average_cost { type: average value_format: "0.00" sql: ${cost} ;; # Specify the field that you want to average # The field ‘cost’ is declared elsewhere } dimension: name { sql: CONCAT(${first_name}, ' ', ${last_name}) ;; } dimension: days_in_inventory { type: number sql: DATEDIFF(${sold_date}, ${created_date}) ;; }

As shown in the last two dimensions above, SQL blocks can use functions that are supported by the underlying database (such as MySQL functions CONCAT and DATEDIFF in this case). The code you use in SQL blocks must match the SQL dialect used by the database.

Example SQL Block for Derived Tables

Derived tables also use a SQL block to specify the query that derives the table. An example is below:

- view: user_order_facts derived_table: sql: | # Get the number of orders for each user SELECT user_id , COUNT(*) as lifetime_orders FROM orders GROUP BY 1   # later, dimension declarations reference the derived column(s)… fields: - dimension: lifetime_orders type: number
view: user_order_facts { derived_table: { sql: SELECT user_id , COUNT(*) as lifetime_orders FROM orders GROUP BY 1 ;; }   # later, dimension declarations reference the derived column(s)… dimension: lifetime_orders { type: number } }

Still have questions?
Go to Discourse - or - Email Support
Top