This page contains information about connecting Looker to an Apache Hive database.
Introduction
Looker is architected to connect to a database server via JDBC. In the case of Hive, this is the thrift server (HiveServer2). See the Apache documentation for more information.
By default, this server will listen on port 10000.
Looker is an interactive querying tool, so it expects to work with an interactive SQL engine. If Hive is running on MapReduce — hive.execution.engine
is set to mr
— then Hive will return query results too slowly to be practical.
Looker was tested with Hive on Tez (hive.execution.engine=tez
), although it is also possible to run Looker against Hive on Spark. Spark support was added in Hive version 1.1. (Looker supports Hive 1.2.1+.)
Hosted Hive Services
Most hosted Hive services run Hive with Tez or Spark.
- Amazon Elastic Map Reduce (EMR): Runs Hive on Tez by default.
- Microsoft HDInsight: Runs Hive on Tez.
- Google Dataproc: Runs Hive on MapReduce, though there is a script that installs Tez in their dataproc-initialization-actions GitHub repo.
- Qubole: Runs Hive on Tez. To connect to Qubole Hive, you must connect directly to the master node.
Hadoop Distributions
- Cloudera CDH: The latest version of CDH (as of September 2018) ships with Hive 1.1, which is unsupported by Looker.
- MapR: MapR 4.1 documentation refers to Tez as a “developer preview.”
- Hortonworks HDP: Hortonworks provides full support for Hive on Tez.
Verifying the JDBC Connection
Before connecting Looker to the Hive thrift server, we recommend you connect another JDBC client and run queries to make sure that Hive performance is satisfactory.
Hive comes with a JDBC client called beeline
. Verify that the thrift server is running with a command similar to:
or if you want to connect as a specific user:
Once connected, verify that the command !tables
lists all databases and tables. The !tables
command makes a similar call as the one Looker uses in SQL Runner or when creating a new LookML project.
Also verify that you can run queries, with a query that invokes the Tez engine, using a command similar to:
Queries such as select * from table
and show tables
don’t invoke the Tez engine.
PDTs
If you are going to enable PDTs in Looker on Hive, create a scratch schema for Looker to use, using a command similar to:
The user that Looker uses to connect to Hive (which can be anonymous if no authentication is used) must have the following abilities in the scratch schema:
- Create tables
- Alter tables
- Drop tables
Test this with a JDBC client before attempting to create PDTs with Hive.
Queues
Sometimes you want queries from Looker to go into a specific queue. To configure that, enter the queue name parameter in the Additional Params field on the Connection Settings page:
Other Hive parameters can be set this way in the Additional Params field on the Connection Settings page.
Using user attributes, it is possible for queries from different users or different groups of users to go into different queues. To do this, create a user attribute named something like queue_name
, then in the Additional Params field, add:
You can use this to customize other hive-site.xml
parameters on a per-user or per-group basis as well.
Adding the Connection
In the Admin section of Looker, navigate to the Connections page and click New Connection. Looker displays this page:
- Name: Specify the name of the connection. This is how you will refer to the connection in LookML projects.
Dialect: Specify the dialect: Apache Hive 2, Apache Hive 2.3+, or Apache Hive 3.1.2+.
For Apache Hive 3.1.2+, Looker can only fully integrate with Apache Hive 3 databases on versions specifically 3.1.2+. This is because of a query parsing issue from Hive versions 2.4.0 - 3.1.2 that resulted in extremely long parsing times for Looker-generated SQL.
- Host: Specify the hostname.
- Port: Specify the database port.
- Database: Specify the database name.
- Username: Specify the database username.
- Password: Specify the database user password.
- Persistent Derived Tables: Check this box to enable persistent derived tables. This setting reveals the Temp Database field and the PDT Overrides column.
- Temp Database: Specify the name of the scratch schema created in the PDTs section above.
- Max PDT Builder Connections: Specify the number of possible concurrent PDT builds on this connection. Setting this value too high could negatively impact query times. For more information, see the Connecting Looker to Your Database documentation page.
- Additional Params: Specify any additional JDBC string parameters.
- PDT And Datagroup Maintenance Schedule: Specify a
cron
expression that indicates when Looker should check datagroups and persistent derived tables. Read more about this setting in our PDT and Datagroup Maintenance Schedule documentation. - SSL: Check to use SSL connections.
- Verify SSL Cert: Check for hostname verification.
- Max Connections: This setting can be left at the default value initially. Read more about this setting in the Max Connections subsection of the Connecting Looker to Your Database documentation page.
- Connection Pool Timeout: This setting can be left at the default value initially. Read more about this setting in the Connection Pool Timeout subsection of the Connecting Looker to Your Database documentation page.
- SQL Runner Precache: To cause SQL Runner not to preload table information and to load table information only when a table is selected, uncheck this option. Read more about this setting in the SQL Runner Precache subsection of the Connecting Looker to Your Database documentation page.
- Database Time Zone: Specify the time zone used in the database. Leave this field blank if you do not want time zone conversion. See the Using Time Zone Settings documentation page for more information.
Feature Support
For Looker to support some features, your database dialect must also support them.
In Looker 21.0, Apache Hive 2 supports the following Looker features:
In Looker 21.0, Apache Hive 2.3+ supports the following Looker features:
In Looker 21.0, Apache Hive 3.1.2+ supports the following Looker features:
Next Steps
After you have connected your database to Looker, you’re ready to configure sign-in options for your users.