After you have secured and configured your database, you’re ready to connect your database to Looker.
Creating a New Database Connection
Select Admin > Connections (under Database). On the Connections page, click New Connection:
If you have not already configured a Looker-hosted database, there will be two tabs at the top of the New Connection page: the Existing Database tab and the New Database tab.
- Click Existing Database to configure a Looker connection to an existing database.
- Click New Database to create, connect, and download your data into a new Looker-managed database for use with the Looker Marketplace.
If you do not see the tabs at the top of the New Connection page, refer to the Existing Database section of this page.
Looker displays the Connection Settings page:
The fields that the Connection Settings page displays depend on your dialect setting. The example above shows the configurable connection settings for Amazon Redshift.
For more information about applying user attributes to connection settings, see the Connections section of the User Attributes documentation page.
For more information about using the PDT Overrides column to configure separate login credentials for PDT processes, see the Configuring Separate Login Credentials for PDT Processes section.
The following options are available for configuration when connecting Looker to Amazon Redshift.
The name of the connection as you want to refer to it. You should not use the name of any folders. This value does not need to match anything in your database; it is just a label that you assign. You’ll use it in the
connection parameter of your LookML model.
The SQL dialect that matches your connection. It’s important to choose the correct value so that you are presented with the proper connection options, and so that Looker can properly translate your LookML into SQL.
The SSH Server option is available if the instance is deployed on Kubernetes infrastructure, and only if the ability to add SSH server configuration information to your Looker instance has been enabled. If this option is not enabled on your Looker instance and you would like to enable it, contact your Looker account manager or open a support request in Looker’s Help Center.
To connect to your database using an SSH tunnel, turn on the toggle and select an SSH server configuration from the drop-down list.
Your database hostname and the port that Looker should use to connect to your database host.
If you worked with a Looker analyst to configure an SSH tunnel to your database, in the Host field enter
"localhost", and in the Port field enter the port number that redirects to your database, which your Looker analyst should have provided.
If you apply a user attribute to the Host field, the user attribute cannot have a user access level set to Editable.
If you configured an SSH tunnel to connect to your database, you cannot apply a user attribute to the Remote Host:Port field.
The name of the database on your host. For example, you might have a hostname of
my-instance.us-east-1.redshift.amazonaws.com on which there is a database called
sales_info. You would enter
sales_info in this field. If you have multiple databases on the same host, you may need to create multiple connections to use them (with the exception of MySQL, in which the word database means something a little bit different than in most SQL dialects).
For Snowflake and Google BigQuery connections, you have the option to use OAuth. This means that your users are required to log in to Snowflake or Google, respectively, in order to issue queries from Looker.
When you select Use OAuth, you will see the OAuth Client ID and OAuth Client Secret fields:
These values are generated from the Snowflake database or from Google. See the documentation page that describes Snowflake OAuth configuration or Google BigQuery OAuth configuration for the full procedure.
The username that Looker should use to connect to your database. You should configure the user ahead of time according to our database configuration instructions.
The password that Looker should use to connect to your database. You should configure the password ahead of time according to our database configuration instructions.
The default schema that Looker uses when a schema is not specified. This applies when using SQL Runner, during LookML project generation, and when querying tables.
Persistent Derived Tables
Check this box to enable persistent derived tables. This reveals additional PDT fields and the PDT Overrides column. Looker displays this option only if the database dialect supports using PDTs.
Note the following about PDTs:
- PDTs are not supported for Snowflake or Google BigQuery connections that use OAuth.
- Disabling PDTs on a connection does not disable the datagroups that are associated with your PDTs. Even if you disable PDTs, existing datagroups will still run their
sql_triggerqueries against the database. If you want to stop a datagroup from running its
sql_triggerquery against your database, you must delete or comment out the
datagroupparameter from your LookML project, or you can update the PDT and Datagroup Maintenance Schedule setting for the connection so that Looker checks PDTs and datagroups very infrequently or never.
- For Snowflake connections, Looker sets the value for the
TRUE(which is Snowflake’s default value). AUTOCOMMIT is required for SQL commands that Looker runs to maintain its PDT registration system.
When you update to Looker 7.20, Snowflake connections will now be made with the session parameter
AUTOCOMMIT set to
Although this is labeled Temp Database, you’ll enter either the database name or schema name — as appropriate for your SQL dialect — that Looker should use to create persistent derived tables. You should configure this database or schema ahead of time, with the appropriate write permissions. On the Database Configuration Instructions documentation page, select your database dialect to see the instructions for that dialect.
Each connection must have its own Temp Database or Schema; they cannot be shared across connections.
Max PDT Builder Connections
The Max PDT Builder Connections field configures the maximum number of persistent tables that can be built in parallel for the connection. This setting applies to persistent derived tables and aggregate tables that use either
sql_trigger_value persistence strategies.
The concurrency defaults to 1 but may be set as high as 10. However, the value cannot be higher than the value set in the Max Connections field or in the
per-user-query-limit set in Looker’s startup options.
Set this value carefully. If the value is too high, you may overwhelm your database. If the value is low, then long-running PDTs or aggregate tables can delay the creation of other persistent tables or slow down other queries on the connection. Databases that support multitenancy — such as BigQuery, Snowflake, and Redshift — may be more performant in handling parallel query builds.
A good rule of thumb is to assign this field a low value at first, at 1 or 2, and then slowly raising it incrementally by 1, assuming query performance isn’t impacted.
Note the following about the Max PDT Builder Connections setting:
- Tables persisted with the
persist_forstrategy are always rebuilt sequentially, so the Max PDT Builder Connections setting does not apply.
- For tables that use the
sql_trigger_valuepersistence strategy, the Max PDT Builder Connections setting applies only to the rebuilding of the table, not to the trigger check for the table. The trigger check is a query that checks whether the table’s persistence strategy is triggered; because these trigger check queries are always run sequentially, the Max PDT Builder Connections setting does not apply.
- In a clustered Looker instance, the Max PDT Builder Connections setting specifies the limit on each node. So, if your Max PDT Builder Connections setting is
1, and your instance has three nodes, you could rebuild three PDTs on the connection at the same time.
- Regardless of the Max PDT Builder Connections setting, the following types of tables are built consecutively, not concurrently:
- PDTs or aggregate tables persisted through the
- PDTs or aggregate tables in Development Mode.
- PDTs or aggregate tables rebuilt via the Rebuild Derived Tables & Run option.
- PDTs or aggregate tables that rely on other PDTs or aggregate tables. However, the dependency tables can build concurrently, provided they are not dependent on one another.
- PDTs or aggregate tables persisted through the
You can include additional Java Database Connectivity (JDBC) parameters for your queries here, if needed.
If you want to use a user attribute in a JDBC parameter, you can use Liquid templating. The syntax is
_user_attributes['name_of_your_user_attribute']. For example:
Here’s how it might look in the Additional Params field in Looker:
PDT and Datagroup Maintenance Schedule
This setting accepts a
cron expression that indicates when Looker should check datagroups and persistent derived tables (that are based on
sql_trigger_value) to be regenerated or dropped.
The default value of
*/5 * * * * means “check every 5 minutes,” which is the maximum frequency of checks. A
cron expression indicating more frequent checks will cause checks to occur every 5 minutes.
While PDTs are building, Looker does not perform additional trigger checks. Once all PDTs from the last trigger check are built, Looker will resume checking datagroup and PDT triggers based on the PDT and datagroup maintenance schedule.
If your database isn’t up 24/7, you may want to limit checks to times when the database is up. Here are some additional
||Check datagroups and PDTs every 5 minutes during business hours, Monday through Friday|
||Check datagroups and PDTs every 5 minutes during business hours, every day|
||Check datagroups and PDTs every hour during business hours, Monday through Friday|
||Check datagroups and PDTs every day at 3:01am|
A few things to note when creating a
- Looker uses parse-cron v0.1.3, which doesn’t support
cronexpression uses the Looker application timezone to determine when checks are made.
- If PDTs aren’t being built, reset the cron string back to the default of
*/5 * * * *.
Below are a couple of resources to assist with creating
- https://crontab.guru — Help editing and testing
- http://www.crontab-generator.org — Select time settings and the generator creates the corresponding
Choose whether or not you want to use SSL encryption to protect data as it passes between Looker and your database. SSL is only one option that can be used to protect your data; other secure options are described on the Enabling Secure Database Access documentation page.
Verify SSL Cert
Choose whether you want to require verification of the SSL certificate used by the connection. If verification is required, the SSL Certificate Authority (CA) that signed the SSL certificate must come from the client’s list of trusted sources. If the CA is not a trusted source, the database connection is not established.
If this box is unchecked, SSL encryption is still used on the connection, but verification of the SSL connection is not required, so a connection can be established when the CA is not on the client’s list of trusted sources.
Here you can set the maximum number of connections that Looker can establish with your database. For the most part, you are setting the number of simultaneous queries that Looker can run against your database. Looker also reserves up to three connections for query killing. If the connection pool is very small, then Looker will reserve fewer connections.
You should set this value carefully. If the value is too high, you may overwhelm your database. If the value is too low, then queries have to share a small number of connections. Thus many queries may seem slow to users as the queries have to wait for other, earlier queries to return.
The default value (which varies depending upon your SQL dialect) is typically a reasonable starting point. Most databases also have their own settings for the maximum number of connections they will accept. If your database configuration limits connections, ensure that your Max Connections value is equal to or lower than your database’s limit.
Connection Pool Timeout
If your users do request more connections than the Max Connections setting, the requests will wait for others to finish before they are executed. The maximum amount of time that a request will wait is configured here. You should set this value carefully. If it is too low, users may find their queries get cancelled because there isn’t enough time for other user’s queries to finish. If it is too high, large numbers of queries may build up causing users to wait for a very long time. The default value is typically a reasonable starting point.
SQL Runner Precache
In SQL Runner, all table information is pre-loaded as soon as you select a connection and schema. This enables SQL Runner to quickly display table columns as soon as you click on a table name. However, for connections and schema with many tables or with very large tables, you may not want SQL Runner to pre-load all of the information.
If you prefer SQL Runner to load table information only when a table is selected, you can uncheck the SQL Runner Precache option to disable SQL Runner pre-loading for the connection.
Database Time Zone
The time zone in which your database stores time-based information. Looker needs to know this so that it can convert time values for users, making it easier to understand and use time-based data. See the Using Time Zone Settings documentation page for more information.
Query Time Zone
The Query Time Zone option is visible only if you have disabled User Specific Time Zones.
When User Specific Time Zones are disabled, the Query Time Zone is the time zone that is displayed to your users when they query time-based data, and the time zone into which Looker will convert time-based data from the Database Time Zone.
See the Using Time Zone Settings documentation page for more information.
Configuring Separate Login Credentials for PDT Processes
If your database supports persistent derived tables, and you have checked the Persistent Derived Tables box in the connection settings, Looker displays the PDT Overrides column. In the PDT Overrides column, you can enter separate JDBC parameters (host, port, database, username, password, schema, and additional parameters) that are specific to PDT processes. This can be valuable for a number of reasons:
- By creating a separate database user for PDT processes, you can use PDTs in your model even if you assign user attributes to your database login credentials.
- PDT processes can authenticate through a separate database user who has a higher priority. This way the database can prioritize the PDT jobs over less-critical user queries.
- Write access can be revoked for the standard Looker database connection, and only granted to a special user that PDT processes will use for authentication. This is a better security strategy for most organizations.
- For databases like Snowflake, PDT processes can be routed to more powerful hardware that is not shared with the rest of the Looker users. This way PDTs can build quickly without incurring the cost of running expensive hardware full-time.
For example, the configuration below shows a connection where the username and password fields are set to user attributes. This way, each user can access the database using their individual credentials. The PDT Override column creates a separate user (
pdt_user) with its own password. The
pdt_user account will be used for all PDT processes, with access levels appropriate to PDT creation and update:
While the PDT Overrides column enables you to change the database user and other connection properties, a PDT override must read the same data as the default connection, and it must write data to the same place. Looker cannot read data from one location and write it to another.
Testing Your Connection Settings
Once you’ve entered the credentials, click Test These Settings to verify that the information is correct and the database is able to connect:
If your connection does not pass one or more tests:
- Try some of the troubleshooting steps on the Testing Database Connectivity documentation page.
- If you are running Mongo version 3.6 or earlier on Atlas and you get a communications link failure, see the Mongo Connector documentation page.
- To receive successful connection messages regarding the temp schema and PDTs, you must allow that functionality when you set up your Looker database. Instructions for doing so can be found on the Database Configuration Instructions documentation page.
Database connections that use OAuth, such as Snowflake and Google BigQuery, require a user login. If you are not logged in to your OAuth user account when you test one of these connections, Looker will show a warning with a Log In link. Click the link to enter your OAuth credentials or to allow Looker access to your OAuth account information.
If you are still having trouble, contact Looker Support for assistance.
Adding Your Database Connection
Once you have configured and tested your database connection settings, click Add Connection. Your database connection is now added to the list on the Connections page.
For use with the Looker Marketplace, the New Database tab lets you create a new Looker-managed database, load your analytics data, and connect to a Looker analytics application or pre-built model. This lets you get up and running with a full analytics suite quickly, usually within 24 hours.
Creating the Database
After you click the New Database tab, Looker shows the Add a new database page:
Enter your name and email address, and click Create Database. Your email address must be unique; you can’t use the same email address twice.
Your new database is available for a free trial for a standard three weeks. After your free trial, someone from Looker will reach out to see how you’ve liked it. If you choose not to continue with managed services, you can simply delete the database connection.
Connecting to a Data Source
After Looker creates the new database, Looker shows the Connect Data Sources page:
Click Connect under the data source from which you want to download your data and populate the Looker analytics application or model. Looker will request your user credentials for that data source (your customer ID for Google Ads, for example):
Enter your user credentials and click Authorize. Your data should start downloading soon after, and usually completes within 24 hours.
After the connection is created, you can click Connection Details to view the connection configuration:
Viewing the Connection
Once you’ve created a new Looker-managed connection, you can view the connection on the Connections page in the Looker Managed section of the page:
You can test, edit, or perform other actions on the connection just like any other database connection. In addition, the Add Data button returns you to the Connect Data Sources screen where you can download your data from additional data sources.
Deleting the Database Connection
To delete the connection to the database, use the Connections page in Looker’s Admin section to delete the connection as you would any other.
After you have connected your database to Looker, you’re ready to configure sign-in options for your users.