Skip to content

Hive UDF

These are steps to enable encryption via a Hive User-Defined Function (UDF).

Add Privacera UDF in Hive#

  1. Login to Privacera Portal.

  2. On the Privacera home page, expand Diagnostics and click Health Check.

  3. Under the Diagnostics tab, click Encryption.

  4. Click Create for UDF - Protect, UnProtect.

  5. At the prompt, click Yes.

The UDF is created successfully.

Confirm Privacera UDF in Hive#

  1. SSH to the instance.

  2. Do kinit for ${user}.

  3. Connect to beeline.

     #Example
    beeline -u "jdbc:hive2://{hostname}:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n ${user}
    
  4. After connecting to beeline, enter the following two commands to describe the functions created by Privacera.

    • DESCRIBE FUNCTION EXTENDED privacera.protect;

      <img src="../assets/image11.png" style="width:100%;height:auto" />

    • DESCRIBE FUNCTION EXTENDED privacera.unprotect;

      <img src="../assets/image12.png" style="width:100%;height:auto" />

  5. If the UDF has not been created successfully, you will see the error message “Function ‘privacera.protect’ does not exist”.

    <img src="../assets/image14.png" style="width:100%;height:auto" />

Test Privacera UDF in Hive#

  1. Login to Privacera Portal.

  2. Click Encryption & Masking > Schemes .

  3. Click Add.

  4. Create a scheme by entering the following details, for example:

    • Scheme Name: SWIFT_ID
    • Format type: FPE_ALPHA_NUMERIC
    • Scope: All
    • Algorithm: FPE
  5. Click Save.

  6. Check if the KMS key is generated for the scheme.

    • Login to Ranger http://{hostname}:6080 with username as keyadmin.

    • Click Encryption > Key Manager.

    <img src="../assets/image15.png" style="width:100%;height:auto" />

  7. SSH to the instance.

  8. Do kinit for ${user}.

  9. Connect to beeline.

     #Example
    beeline -u "jdbc:hive2://{hostname}:2181/;
    serviceDiscoveryMode=zooKeeper;
    zooKeeperNamespace=hiveserver2" -n ${user}
    
  10. After connecting to beeline enter the following commands.

    select privacera.protect("TEXT" ,"SWIFT_ID");
    select privacera.unprotect(privacera.protect("TEXT" ,"SWIFT_ID"), "SWIFT_ID");
    
  11. Check the KMS audits in Ranger.

    • Login to Ranger http://{hostname}:6080 with username as keyadmin.

    • Click Audit > Access.

    • If the UDF fails, the message "denied access" is recorded in Ranger KMS audits.

      !!! note If access is denied, you need to give permission to ${user}.

      <img src="../assets/image16.png" style="width:100%;height:auto" />

    • If the UDF result is successful, the audits are shown as Allowed.

      <img src="../assets/image17.png" style="width:100%;height:auto" />

    • To check crypto UDF logs, run this command:

      sudo su
      tail -f /var/log/hive/hiveserver2.log
      

Give Users Access to the UDFs#

Users need access to the UDFs. See general steps in Set User Access in Ranger KMS.

Setup Privacera Encryption#

When to Use Shaded Jar#

You might need to use a shaded jar depending on certain conditions described below.

To check if the shaded jar is needed:

  1. Go to the Hive libraries directory.

    cd /opt/cloudera/parcels/CDH/lib/hive/lib/
    ls | grep http
    

    You need to install the shaded jar:

    • If the version of the httpclient is less than 4.5.6.

      Or

    • If the UDF throws a NoClassDefFound exception for class org/apache/http/config/Lookup.

  2. To install the shaded jar, see shaded jar section.

Create Privacera UDF in CDH/CDP#

The following are the steps to install Privacera Encryption jar in CDH/CDP cluster.

  1. Download jar to the cluster node.

    sudo su - privacera
    export PRIVACERA_BASE_DOWNLOAD_URL=${PRIVACERA_BASE_DOWNLOAD_URL}
    wget ${PRIVACERA_BASE_DOWNLOAD_URL}/privacera-crypto-jar-with-dependencies.jar -O privacera-crypto-jar-with-dependencies.jar
    
  2. Upload jar into HDFS location from where Hive can access it. Following are the commands to upload jar into HDFS using Hadoop CLI.

    kinit -kt /opt/privacera/keytab/privacera.headless.keytab privacera
    hadoop fs -ls
    hadoop fs -mkdir -p /privacera/crypto/jars
    hadoop fs -put privacera-crypto-jar-with-dependencies.jar /privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar
    
  3. Create configuration files.

    hadoop fs -mkdir -p /privacera/crypto/configs
    
    vi crypto.properties
    
    privacera.portal.base.url=http://${PRIVACERA_SERVER}:6868
    privacera.portal.username=${USER_NAME}
    privacera.portal.password=${PASSWORD}
    #Mode of encryption/decryption rpc/native
    privacera.crypto.mode=native
    
    cp crypto.properties  crypto_default.properties
    
  4. Upload configuration to DBFS.

    hadoop fs -put crypto.properties /privacera/crypto/configs/crypto.properties
    hadoop fs -put crypto_default.properties /privacera/crypto/configs/crypto_default.properties
    
  5. Install Crypto jar into Cluster.

    1. SSH to the cluster node.
    2. Do kinit for ${user} (if Kerberos is enabled).
    3. Connect to beeline.
    4. If you have Kerberos-enabled cluster, use the following command to login to beeline:

      Update the values for for the variables ${HIVE_SERVER2_HOSTNAME} and ${REALM} below.

      HIVE_SERVER2_HOSTNAME={{10.1.1.2}}
      REALM={{EXAMPLE.PRIVACERA.US}}
      beeline -u "jdbc:hive2://${HIVE_SERVER2_HOSTNAME}:10000/default;principal=hive/${HIVE_SERVER2_HOSTNAME}@${REALM}"
      
  6. Add Privacera crypto UDF jar. You need to run multiple SQL queries in Databricks cluster to create privacera encryption functions.

    • SQL query to create the Privacera unprotect UDF.

      add jar hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar;
      
  7. Create the Privacera encryption UDF. You need to run multiple SQL queries in beeline to create Privacera encryption UDF.

    • SQL query to create Privacera unprotect udf.

      create database if not exists privacera ;
      drop function if exists privacera.unprotect;
      create function privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar';
      
    • SQL query to create Privacera protect UDF.

      drop function if exists privacera.protect;
      create function privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar';
      

Installing the Shaded Jar#

The following are the steps to install the shaded Privacera Encryption jar in a CDH/CDP cluster.

  1. Download jar in cluster node.

    sudo su - privacera
    export PRIVACERA_BASE_DOWNLOAD_URL=${PRIVACERA_BASE_DOWNLOAD_URL}
    wget ${PRIVACERA_BASE_DOWNLOAD_URL}/privacera-crypto-jar-with-dependencies-shaded.jar -O privacera-crypto-jar-with-dependencies-shaded.jar
    
  2. Upload jar into HDFS location from where Hive can access it. Following are the commands to upload jar into HDFS using Hadoop CLI.

    kinit -kt /opt/privacera/keytab/privacera.headless.keytab privacera
    hadoop fs -ls /
    hadoop fs -mkdir -p /privacera/crypto/jars
    hadoop fs -put privacera-crypto-jar-with-dependencies-shaded.jar /privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar
    
    3. Create configuration files.

    hadoop fs -mkdir -p /privacera/crypto/configs
    
    vi crypto.properties
    
    privacera.portal.base.url=http://${PRIVACERA_SERVER}:6868
    privacera.portal.username=${USER_NAME}
    privacera.portal.password=${PASSWORD}
    
    #Mode of encryption/decryption rpc/native
    privacera.crypto.mode=native
    
    cp crypto.properties  crypto_default.properties
    
  3. Upload configuration to DBFS.

    hadoop fs -put crypto.properties /privacera/crypto/configs/crypto.properties
    hadoop fs -put crypto_default.properties /privacera/crypto/configs/crypto_default.properties
    
  4. Install Crypto jar into Cluster.

    1. SSH to the cluster node.
    2. Do kinit for ${user} (if Kerberos is enabled).
    3. Connect to beeline.
    4. If you have Kerberos-enabled cluster, use the following command to login to beeline:

      Update the values for ${HIVE_SERVER2_HOSTNAME} and ${REALM} as per your environment

      HIVE_SERVER2_HOSTNAME={{10.1.1.2}}
      REALM={{EXAMPLE.PRIVACERA.US}}
      beeline -u "jdbc:hive2://${HIVE_SERVER2_HOSTNAME}:10000/default;principal=hive/${HIVE_SERVER2_HOSTNAME}@${REALM}"
      
  5. Add Privacera crypto UDF jar. You need to run multiple SQL queries in Databricks cluster to create privacera encryption functions.

    • SQL query to create the Privacera unprotect UDF.

      add jar hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar;
      
  6. Create the Privacera encryption UDF. (User defined function) You need to run multiple SQL queries in beeline to create Privacera encryption UDF.

    • SQL query to create Privacera unprotect udf.

      create database if not exists privacera ;
      drop function if exists privacera.unprotect;
      CREATE FUNCTION privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar';
      
    • SQL query to create Privacera protect UDF.

      drop function if exists privacera.protect;
      CREATE FUNCTION privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar';
      

Hive TEZ: Add Properties File in beeline#

If you are using Hive TEZ, before executing the UDFs, you must add the cryptop.properties file in beeline wit hthe following command:

add file hdfs:///privacera/crypto/configs/crypto.properties;

Sample Queries to Verify Setup#

  • Sample query to run encryption:

    select privacera.protect("test_data","${SCHEME_NAME}") limit 10;
    
  • Sample query to run encryption and decryption in one query to verify setup:

    select privacera.unprotect(privacera.protect("test_data","${SCHEME_NAME}"),"${SCHEME_NAME}") limit 10;
    
  • For authorization and leveraging Hive Masking policies, you need to install the Hive plug-in in hiveserver2.

If you do not want to install the Hive plug-in, you can authorize use of the keys based on KMS.

  • Create a view on top of your raw table:

    create view secure_view as select col1, privacera.protect(col2, ‘SSN’) as col2 from db.table;
    
    select * from secure_view;
    
  • If the user is not present in Privacera, they can still access the protect function. It is recommended to use the Hive plug-in as they can control the access to the resource using Ranger Policies and it makes it easier to manage them with the simple UI.


Last update: July 23, 2021