User Guide Getting Started Help Center Documentation Community Training
Looker
  
English
Français
Deutsch
日本語
Apache Hive2

This page contains information about connecting Looker to an Apache Hive2 database.

Introduction

Looker is architected to connect to a database server via JDBC. In the case of Hive, this is the thrift server (HiveServer2). See the Apache documentation for more information.

By default, this server will listen on port 10000.

Looker is an interactive querying tool, so it expects to work with an interactive SQL engine. If Hive is running on MapReduce — hive.execution.engine is set to mr — then Hive will return query results too slowly to be practical.

Looker was tested with Hive on Tez (hive.execution.engine=tez), though it is also possible to run Looker against Hive on Spark. Spark support was added in Hive version 1.1. (Looker supports Hive 1.2.1 and later.)

Hosted Hive Services

Most hosted Hive services run Hive with Tez or Spark.

Hadoop Distributions

Verifying the JDBC Connection

Before connecting Looker to the Hive thrift server, we recommend you connect another JDBC client and run queries to make sure that Hive performance is satisfactory.

Hive comes with a JDBC client called beeline. Verify that the thrift server is running with a command similar to:

bin/beeline -u 'jdbc:hive2://localhost:10000/'

or if you want to connect as a specific user:

bin/beeline -u 'jdbc:hive2://localhost:10000/' -n user -p abc123

Once connected, verify that the command !tables lists all databases and tables. The !tables command makes a similar call as the one Looker uses in SQL Runner or when creating a new LookML project.

Also verify that you can run queries, with a query that invokes the Tez engine, using a command similar to:

select col1, count(col2) from db.table group by col1 order by count(col2) desc limit 25

Queries such as select * from table and show tables don’t invoke the Tez engine.

PDTs

If you are going to enable PDTs in Looker on Hive, create a scratch schema for Looker to use, using a command similar to:

create looker_scratch;

The user that Looker uses to connect to Hive (which can be anonymous if no authentication is used) must have the following abilities in the scratch schema:

Test this with a JDBC client before attempting to create PDTs with Hive.

Queues

Sometimes you want queries from Looker to go into a specific queue. To configure that, enter the queue name parameter in the Additional Params field on the Connection Settings page:

?tez.queue.name=the_bi_queue

Other Hive parameters can be set this way in the Additional Params field on the Connection Settings page.

Using user attributes, it is possible for queries from different users or different groups of users to go into different queues. To do this, create a user attribute named something like queue_name, then in the Additional Params field, add:

?tez.queue.name={{ _user_attributes['queue_name'] }}

You can use this to customize other hive-site.xml parameters on a per-user or per-group basis as well.

Feature Support

Looker’s ability to provide some features depends on whether the database dialect can support them.

In the current Looker release, Apache Hive2 supports the following Looker features:

Top