This page contains information about connecting Looker to an Apache Hive database.
Looker can only fully integrate with Apache Hive 3 databases on versions specifically 3.1.2+. This is because of a Query Parsing Issue from Hive versions 2.4.0 - 3.1.2 that resulted in extremely long parsing times for Looker generated SQL.
Looker is architected to connect to a database server via JDBC. In the case of Hive, this is the thrift server (HiveServer2). See the Apache documentation for more information.
By default, this server will listen on port 10000.
Looker is an interactive querying tool, so it expects to work with an interactive SQL engine. If Hive is running on MapReduce —
hive.execution.engine is set to
mr — then Hive will return query results too slowly to be practical.
Looker was tested with Hive on Tez (
hive.execution.engine=tez), although it is also possible to run Looker against Hive on Spark. Spark support was added in Hive version 1.1. (Looker supports Hive 1.2.1 and later.)
Hosted Hive Services
Most hosted Hive services run Hive with Tez or Spark.
- Amazon Elastic Map Reduce (EMR): Runs Hive on Tez by default.
- Microsoft HDInsight: Runs Hive on Tez.
- Google Dataproc: Runs Hive on MapReduce, though there is a script that installs Tez in their dataproc-initialization-actions GitHub repo.
- Qubole: Runs Hive on Tez. To connect to Qubole Hive, you must connect directly to the master node.
- Cloudera CDH: The latest version of CDH (as of September 2018) ships with Hive 1.1, which is unsupported by Looker.
- MapR: MapR 4.1 documentation refers to Tez as a “developer preview.”
- Hortonworks HDP: Hortonworks provides full support for Hive on Tez.
Verifying the JDBC Connection
Before connecting Looker to the Hive thrift server, we recommend you connect another JDBC client and run queries to make sure that Hive performance is satisfactory.
Hive comes with a JDBC client called
beeline. Verify that the thrift server is running with a command similar to:
or if you want to connect as a specific user:
Once connected, verify that the command
!tables lists all databases and tables. The
!tables command makes a similar call as the one Looker uses in SQL Runner or when creating a new LookML project.
Also verify that you can run queries, with a query that invokes the Tez engine, using a command similar to:
Queries such as
select * from table and
show tables don’t invoke the Tez engine.
If you are going to enable PDTs in Looker on Hive, create a scratch schema for Looker to use, using a command similar to:
The user that Looker uses to connect to Hive (which can be anonymous if no authentication is used) must have the following abilities in the scratch schema:
- Create tables
- Alter tables
- Drop tables
Test this with a JDBC client before attempting to create PDTs with Hive.
Sometimes you want queries from Looker to go into a specific queue. To configure that, enter the queue name parameter in the Additional Params field on the Connection Settings page:
Other Hive parameters can be set this way in the Additional Params field on the Connection Settings page.
Using user attributes, it is possible for queries from different users or different groups of users to go into different queues. To do this, create a user attribute named something like
queue_name, then in the Additional Params field, add:
You can use this to customize other
hive-site.xml parameters on a per-user or per-group basis as well.
Adding the Connection
In the Admin section of Looker, navigate to the Connections page and click New Connection. Looker displays this page:
- Name: Specify the name of the connection. This is how you will refer to the connection in LookML projects.
- Dialect: Specify the dialect: Apache Hive 2, Apache Hive 2.3+, or Apache Hive 3.1.2+.
- Host: Specify the hostname.
- Port: Specify the database port.
- Database: Specify the database name.
- Username: Specify the database username.
- Password: Specify the database user password.
- Persistent Derived Tables: Check this box to enable persistent derived tables. This setting reveals the Temp Database field and the PDT Overrides column.
- Temp Database: Specify the name of the scratch schema created in the PDTs section above.
- Max PDT Builder Connections: Specify the number of possible concurrent PDT builds on this connection. Setting this value too high could negatively impact query times. For more information, see the Connecting Looker to Your Database documentation page.
- Additional Params: Specify any additional JDBC string parameters.
- PDT And Datagroup Maintenance Schedule: Specify a
cronexpression that indicates when Looker should check datagroups and persistent derived tables. Read more about this setting in our PDT and Datagroup Maintenance Schedule documentation.
- SSL: Check to use SSL connections.
- Verify SSL Cert: Check for hostname verification.
- Max Connections: This setting can be left at the default value initially. Read more about this setting in the Max Connections subsection of the Connecting Looker to Your Database documentation page.
- Connection Pool Timeout: This setting can be left at the default value initially. Read more about this setting in the Connection Pool Timeout subsection of the Connecting Looker to Your Database documentation page.
- SQL Runner Precache: To cause SQL Runner not to preload table information and to load table information only when a table is selected, uncheck this option. Read more about this setting in the SQL Runner Precache subsection of the Connecting Looker to Your Database documentation page.
- Database Time Zone: Specify the time zone used in the database. Leave this field blank if you do not want time zone conversion. See the Using Time Zone Settings documentation page for more information.
Looker’s ability to provide some features depends on whether the database dialect can support them.
In the current Looker release, Apache Hive 2 supports the following Looker features:
In the current Looker release, Apache Hive 2.3+ supports the following Looker features:
In the current Looker release, Apache Hive 3.1.2+ supports the following Looker features: