Fine grained authorization
In this blog I will explain how to use beeline in a secured cluster. The CDH 5.1.0 cluster is secured with Kerberos (authentication) and Sentry (authorization). If you want to setup a secured cluster checkout the related blog kerberos-cloudera-setup. Cloudera is using Sentry for fine grained authorization of data and metadata stored on a Hadoop cluster.
This blog is related to the hive command-line tool, using Hive through HUE is fine!
Why change from Hive CLI to Beeline?
The primary difference between the two involves how the clients connect to Hive. The Hive CLI connects directly to the Hive Driver and requires that Hive be installed on the same machine as the client. However, Beeline connects to HiveServer2 and does not require the installation of Hive libraries on the same machine as the client. Beeline is a thin client that also uses the Hive JDBC driver but instead executes queries through HiveServer2, which allows multiple concurrent client connections and supports authentication.
Cloudera’s Sentry security is working through HiveServer2 and not HiveServer1 which is used by Hive CLI. So hive though the command-line will not follow the policy from Setry. According to the cloudera docs you should not use Hive CLI and WebHCat. Use beeline or impala-sell instead.
hive command-line will bypass sentry security!
Connect with Beeline
For a non secured cluster it is easy to connect. You can use beeline as described in this blog cloudera-migrating-hive-to-beeline.
# beeline with params beeline -u url -n username -p password # url is a jdbc connection string, pointing to the hiveServer2 host. # or use the !connect action beeline beeline> !connect jdbc:hive2://HiveServer2Host:Port
When using a kerberized cluster you can connect using your principle:
# initialize your kerberos ticket kinit # Connect with your ticket, no username / password required. # master01 is the node where HiveServer2 is running # In the url add the parameter principle with the hive principle beeline -u "jdbc:hive2://master01:10000/default;principal=hive/[email protected]"
You can find the full principle name in Cloudera Manager
- Administration -> Kerberos
- Credentials -> search hive
- Use the principle where HiveServer2 is running
Export a query to file with beeline:
HIVESERVER2_URL = "jdbc:hive2://master01:10000/default;principal=hive/[email protected]" # Just simple export (as a table). beeline -u $HIVESERVER2_URL -f quey.sql > result.txt # Result firstline: query, then: pretty table, lastline: '0: jdbc:hive2://master02:10000/default>' # Remove first and last line of result for a proper csv-file with a header: beeline -u $HIVESERVER2_URL -f quey.sql --outputformat=csv --showHeader=true | tail -n +2 -f | head -n -1 > result.csv
More info on the beeline-command-options and hive-command-options on the apache wiki.
Troubleshoot
All the errors look the same. Error: Invalid URL … (state=08S01,code=0). The –verbose=true options does not help much unfortunately. When you run into problems, check the hiveserver2 logs for hints.
Problem:
[alexanderbij@tools01 ~]$ beeline -u jdbc:hive2://master01:10000/default;principal=hive/[email protected] scan complete in 3ms Connecting to jdbc:hive2://master01:10000/default Error: Invalid URL: jdbc:hive2://master01:10000/default (state=08S01,code=0) ...
Solution:
Note that the Invalid URL message does not contain the principle part!
Use "quotes around the url", otherwise the hive principle argument is not used
Problem:
# beeline shell 14/08/08 09:44:23 ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))] ... Caused by: KrbException: Server not found in Kerberos database (7) ... Caused by: KrbException: Identifier doesn't match expected value (906)
Solution:
There is a keytab-file on the HiveServer2-node initialized with the principle. The connection string is using the wrong Kerberos principle for the keytab-file. Make sure you provide the correct hive principle in the connection url.