Clustering Looker

On this Page
Docs Menu

This tutorial explains the recommended method of creating a clustered Looker configuration.

Overview

The Looker application can run single-node or clustered:

  • A single-node Looker application, the default configuration, has all services which make up the Looker application running on a single server.
  • A clustered Looker configuration is a more complex configuration, usually involving database servers, load balancers, and multiple servers running the Looker application. Each node in a clustered Looker application is a server running a single Looker instance.

There are two primary reasons an organization would want to run Looker as a cluster:

  • Load balancing
  • Improved availability and failover

Depending on the scaling issues, a clustered Looker may not provide the solution. For example, if a small number of large queries are using up the system memory, the only solution is to increase the available memory for the Looker process.

Load Balancing Alternatives

Before load balancing Looker, consider increasing the memory and possibly the CPU count of a single server that runs Looker. Looker recommends setting up detailed performance monitoring for memory and CPU utilization to ensure that the Looker server is properly sized for its workload.

Large queries need more memory for better performance. Clustering can provide performance gains when many users are running small queries.

For configurations with up to 50 users who use Looker lightly, Looker recommends running a single server at the equivalent of a large sized AWS EC2 instance (M4.large: 8GB of RAM, 2 CPU cores). For configurations with more users or many active, power users of Looker, watch whether the CPU spikes or if users notice slowness in the application. If so, move Looker to a larger server or run a clustered Looker configuration.

Improved Availability / Failover

Running Looker in a clustered environment can mitigate downtime in the case of an outage. High availability is especially important if the Looker API is used in core business systems or if Looker is embedded into customer-facing products.

In a clustered Looker configuration, a proxy server or load balancer will reroute traffic when it determines that one node is down. Looker automatically handles nodes leaving and joining the cluster.

Required Components

The following components are required for a clustered Looker configuration:

  • Application database
  • Looker nodes (servers running the Looker Java process)
  • Load balancer
  • Shared filesystem
  • Proper version of the Looker application JAR file

Application Database

Looker uses an application database (often called an internal database) to hold application data. When running Looker as a single-node Looker application, Looker normally uses an in-memory HyperSQL database.

In a clustered Looker configuration, each node’s Looker must point at a shared transactional database, called the application or internal database. Currently only MySQL is supported for the application database for clustered Lookers. Looker does not manage the maintenance and backups of that database. However, since the database hosts almost all of the Looker application configuration data, it should be provisioned as a high-availability database and backed up at least daily.

Looker Nodes

Each node is a server with the Looker Java process running on it. The servers in the Looker cluster need to be able to reach each other and the Looker application database. The default ports are listed later in this page.

Load Balancer

To balance the load or redirect requests to available nodes, a load balancer or proxy server (e.g. NGINX or AWS ELB) is required to direct traffic to each Looker node. The load balancer handles health checks. In the event of a node failure, the load balancer must be configured to reroute traffic to the remaining healthy nodes.

The load balancer should have a long timeout (3600 seconds) to prevent queries from being killed.

Shared Filesystem

You must use a POSIX compliant shared file system (such as NFS, AWS EFS, Gluster, BeeGFS, Lustre or many others). Looker uses the shared file system as a repository for various pieces of information used by all the nodes in the cluster.

Looker Application (.jar executable)

You must use a Looker application JAR file that is Looker release 3.56 or higher.

Looker strongly recommends that each node in a cluster run the same Looker release and patch version, as discussed later in this page.

Setting up the Cluster

The following tasks are required:

  1. Install Looker
  2. Set up a MySQL application database
  3. Set up the shared file system
  4. Share the ssh key repository (depending on your situation)
  5. Open the ports for the nodes to communicate
  6. Start Looker on the nodes

Install Looker

Ensure that you have Looker installed on each node, using the Looker application JAR file and the directions on the On-premise Installation page.

Set Up a MySQL Application Database

For a clustered Looker configuration, the application database must be a MySQL database. If you have an existing non-clustered Looker instance that is using HyperSQL for the application database, you must migrate the application data from the HyperSQL data to your new shared MySQL application database.

Please make sure to backup your Looker directory. The migration process can only go from a HyperSQL database to a MySQL database, not in reverse.

See this article for information about backing up Looker and then migrating the application database from HyperSQL to MySQL.

Set Up the Shared File System

Set up the shared file system:

  1. On the server that will store the shared file system, verify that you have access to another account that can su to the looker user account.
  2. On the server for the shared file system, log into the Looker user account.
  3. If Looker is currently running, shut down your Looker configuration.
  4. If you were previously clustering using inotify scripts then stop those scripts, remove them from cron, and delete them.
  5. Create a network share and mount it on each node in the cluster. Make sure that it is configured to automount on each node, and that the looker user has the ability to read and write to it. For this example, we will call the network share /mnt/looker-share.

  6. On one node, move the looker/model and looker/models-user-* directories to your network share. For example:

    mv looker/models /mnt/looker-share/
    mv looker/models-user-* /mnt/looker-share/
    
  7. For each node, add the --shared-storage-dir setting to the LOOKERARGS. Specify the network share, as shown in this example: ‘—shared-storage-dir /mnt/looker-share’.

    LOOKERARGS should be added to $HOME/looker/lookerstart.cfg so that the settings are not affected by upgrades. If your LOOKERARGS are not listed in that file, then someone may have added them directly to the $HOME/looker/looker shell script.

Share the ssh Key Repository

This section only applies to you if:

  • You are creating a shared file system cluster from an existing Looker configuration and
  • You have projects that were created in Looker release 4.6 or below.

The following procedure requires modifying the looker user’s $HOME/.ssh directory. This can make it difficult to log in and fix something if there are errors in the config. Make sure you have access to another account that can su to the looker user account before you perform these steps.

Set up the ssh key repository to be shared:

  1. On the shared file server, create a directory called ssh-share. For example: /mnt/looker-share/ssh-share.

    Make sure the ssh-share directory is owned by the looker user and the permissions are 700. Also, make sure that directories above the ssh-share (like /mnt and /mnt/looker-share) are not world-writable or group-writable.

  2. On one node, copy the contents of $HOME/.ssh to the new ssh-share directory. For example:

    cp $HOME/.ssh/* /mnt/looker-share/ssh-share

  3. For each node, make a backup of the existing .ssh file and create a symlink to the ssh-share directory. For example:

    cd $HOME
    mv .ssh .ssh_bak
    ln -s /mnt/looker-share/ssh-share .ssh
    

    Be sure to do this step for every node.

Open the Ports for the Nodes to Communicate

Clustered Looker nodes communicate to each other over HTTPS with self-signed certificates and an additional authentication scheme based on rotating secrets in the application database.

The default ports which must be open between cluster nodes are 1551 and 61616. These ports are configurable by using the startup flags listed below. We highly recommend restricting network access to these ports to allow traffic only between the cluster hosts.

Start Looker on the Nodes

Restart the server on each node with the required startup flags.

Each node in a cluster must run the same release and patch version. However, Looker will allow clustering with nodes of different versions, except where it explicitly knows that the nodes would not be compatible. In this case, the incompatible node would simply refuse to join the cluster on startup. However, this is not safe, and should never be done.

Available Startup Flags

The available startup flags include:

Flag Values Purpose
--clustered Add flag to specify this node is running in clustered mode. Required to start or join a cluster.
-H or --hostname 10.10.10.10 The unique hostname that other nodes should use to contact this node. Required to start or join a cluster.
-n 1551 The port for inter-node communication. The default is 1551.
-q 61616 The port for queueing cluster-wide events. 61616 is the default
-d /path/to/looker-db.yml The path to the file that holds the credentials for the Looker application database. Required to start or join a cluster.
--shared-storage-dir /path/to/mounted/shared/storage The option should point to the shared directory set up earlier on this page that holds the looker/model and looker/models-user-* directories.

Example of LOOKERARGS and Specifying Database Credentials

Place the Looker startup flags in a lookerstart.cfg file, located in the same directory as the Looker JAR file.

For example, if you want to tell Looker:

  • To use the file named “looker-db.yml” for its database credentials,
  • That it is a clustered node, and
  • The other nodes of the cluster should contact this host on IP address 10.10.10.10.

You would specify:

LOOKERARGS="-d looker-db.yml --clustered -H 10.10.10.10"

Be sure to specify the correct IP address for your node.

The looker-db.yml file would contain the database credentials, such as:

host: your.db.hostname.com
username: db_user
database: looker
dialect: mysql
port: 3306
password: secretPassword

Finding Your Git SSH Deploy Keys

Where Looker stores Git SSH deploy keys depends on the release when the project was created:

  • For projects created in a pre-4.8 release, the deploy keys are stored in the server’s native SSH directory, ~/.ssh
  • For projects created in 4.8 or a later release, the deploy keys are stored in a Looker-controlled directory, ~/looker/deploy_keys/PROJECT_NAME

Modifying a Looker Cluster

After creating a Looker cluster, you can add or remove nodes without making changes to the other clustered nodes.

Upgrading a Cluster to a New Looker Release

Upgrades may involve schema changes to Looker’s internal database which would not be compatible with previous versions. To upgrade Looker to a new version, there are two methods.

Safer Method

  1. Create a backup of the application database.
  2. Stop all of the cluster’s nodes.
  3. Replace the JAR file on each server.
  4. Start each node one at a time.

Faster Method

This method decreases downtime but will lose any changes made between creating the replica and pointing the proxy server to the new nodes. For example, if someone adds users or creates Looks during the transition, those changes might not be captured in the new application database.

To upgrade using this faster but less complete method is to:

  1. Create a replica of Looker’s application database.
  2. Start a new cluster pointed at the replica.
  3. Point the proxy server or load balancer to the new nodes. Then you could stop the old nodes.
Still have questions?
Go to Discourse - or - Email Support
Top