Privacera Platform master publication

Hive UDFs

:

These are steps to enable encryption via a Hive User-Defined Function (UDF).

Add Privacera UDF in Hive
  1. Login to Privacera Portal.

  2. On the Privacera home page, expand Diagnostics and click Health Check.

  3. Under the Diagnostics tab, click Encryption.

  4. Click Create for UDF - Protect, UnProtect.

  5. At the prompt, click Yes.

The UDF is created successfully.

Confirm Privacera UDF in Hive
  1. SSH to the instance.

  2. Do kinit for <user>.

  3. Connect to beeline.

    # Example
    beeline -u "jdbc:hive2://<hostname>:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n $<user>
    
  4. After connecting to beeline, enter the following two commands to describe the functions created by Privacera.

    • DESCRIBE FUNCTION EXTENDED privacera.protect;

      Image 283545
    • DESCRIBE FUNCTION EXTENDED privacera.unprotect;

      Image 283546
  5. If the UDF has not been created successfully, you will see the error message “Function ‘privacera.protect’ does not exist”.

    Image 283547
Test Privacera UDF in Hive
  1. Login to Privacera Portal.

  2. Click Encryption & Masking > Schemes .

  3. Click Add.

  4. Create a scheme by entering the following details, for example.

    • Scheme Name: SWIFT_ID

    • Format type: FPE_ALPHA_NUMERIC

    • Scope: All

    • Algorithm: FPE

  5. Click Save.

  6. Check if the KMS key is generated for the scheme.

    • Login to Ranger at http://<hostname>:6080 with username as keyadmin.

    • Click Encryption > Key Manager.

    Image 283548
  7. SSH to the instance.

  8. Do kinit for <user>.

  9. Connect to beeline.

    #Example
    beeline -u "jdbc:hive2://<hostname>:2181/;
    serviceDiscoveryMode=zooKeeper;
    zooKeeperNamespace=hiveserver2" -n $<user>
    
  10. After connecting to beeline enter the following commands.

    select privacera.protect("TEXT" ,"SWIFT_ID");
    select privacera.unprotect(privacera.protect("TEXT" ,"SWIFT_ID"), "SWIFT_ID");
    
  11. Check the KMS audits in Ranger.

    • Login to Ranger http://<hostname>:6080 with username as keyadmin.

    • Click Audit > Access.

    • If the UDF fails, the message "denied access" is recorded in Ranger KMS audits.

      If access is denied, you need to give permission to `$<user>`.

      Image 283549
    • If the UDF result is successful, the audits are shown as Allowed.

      Image 283550
    • To check crypto UDF logs, run this command:

      sudo su
      tail -f /var/log/hive/hiveserver2.log
      
Give Users Access to the UDFs

Users need access to the UDFs.

Setup Privacera Encryption
When to Use Shaded Jar

You might need to use a shaded jar depending on certain conditions described below.

To check if the shaded jar is needed:

  1. Go to the Hive libraries directory.

    cd /opt/cloudera/parcels/CDH/lib/hive/lib/
    ls | grep http
    

    You need to install the shaded jar:

    • If the version of the httpclient is less than 4.5.6.

      Or

    • If the UDF throws a NoClassDefFound exception for class org/apache/http/config/Lookup.

  2. To install the shaded jar, see Installing the Shaded Jar.

Create Privacera UDF in CDH/CDP

The following are the steps to install Privacera Encryption jar in CDH/CDP cluster.

  1. Download jar to the cluster node.

    sudo su - privacera
    exportPRIVACERA_BASE_DOWNLOAD_URL=$<PRIVACERA_BASE_DOWNLOAD_URL>
    wget $<PRIVACERA_BASE_DOWNLOAD_URL>/privacera-crypto-jar-with-dependencies.jar -O privacera-crypto-jar-with-dependencies.jar
    
  2. Upload jar into HDFS location from where Hive can access it. Following are the commands to upload jar into HDFS using Hadoop CLI.

    kinit -kt /opt/privacera/keytab/privacera.headless.keytab privacera
    hadoop fs -ls
    hadoop fs -mkdir -p /privacera/crypto/jars
    hadoop fs -put privacera-crypto-jar-with-dependencies.jar /privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar
    
  3. Create configuration files.

    hadoop fs -mkdir -p /privacera/crypto/configs
    
    vi crypto.properties
    
    privacera.portal.base.url=http://$<PRIVACERA_SERVER>:6868
    privacera.portal.username=$<USER_NAME>
    privacera.portal.password=$<PASSWORD>
    #Mode of encryption/decryption rpc/native
    privacera.crypto.mode=native
    
    cp crypto.properties  crypto_default.properties
    
  4. Upload configuration to DBFS.

    hadoop fs -put crypto.properties /privacera/crypto/configs/crypto.properties
    hadoop fs -put crypto_default.properties /privacera/crypto/configs/crypto_default.properties
    
  5. Install Crypto jar into Cluster.

    1. SSH to the cluster node.

    2. Do kinit for $<user> (if Kerberos is enabled).

    3. Connect to beeline.

    4. If you have Kerberos-enabled cluster, use the following command to login to beeline:

      Update the values for for the variables <HIVE_SERVER2_HOSTNAME> and <REALM> below.

      HIVE_SERVER2_HOSTNAME=<<10.1.1.2>>REALM=<<EXAMPLE.PRIVACERA.US>>beeline -u "jdbc:hive2://$<HIVE_SERVER2_HOSTNAME>:10000/default;principal=hive/$<HIVE_SERVER2_HOSTNAME>@$<REALM>"
                                       
  6. Add Privacera crypto UDF jar. You need to run multiple SQL queries in Databricks cluster to create privacera encryption functions.

    • SQL query to create the Privacera unprotect UDF.

      add jar hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar;
      
  7. Create the Privacera encryption UDF. You need to run multiple SQL queries in beeline to create Privacera encryption UDF.

    • SQL query to create Privacera unprotect udf.

      create database if not exists privacera ;
      drop functionif exists privacera.unprotect;
      create function privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar';
      
    • SQL query to create Privacera protect UDF.

      drop functionif exists privacera.protect;
      create function privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar';
      
Installing the Shaded Jar

The following are the steps to install the shaded Privacera Encryption jar in a CDH/CDP cluster.

  1. Download jar in cluster node.

    sudo su - privacera
    exportPRIVACERA_BASE_DOWNLOAD_URL=$<PRIVACERA_BASE_DOWNLOAD_URL>
    wget $<PRIVACERA_BASE_DOWNLOAD_URL>/privacera-crypto-jar-with-dependencies-shaded.jar -O privacera-crypto-jar-with-dependencies-shaded.jar
    
  2. Upload jar into HDFS location from where Hive can access it. Following are the commands to upload jar into HDFS using Hadoop CLI.

    kinit -kt /opt/privacera/keytab/privacera.headless.keytab privacera
    hadoop fs -ls /
    hadoop fs -mkdir -p /privacera/crypto/jars
    hadoop fs -put privacera-crypto-jar-with-dependencies-shaded.jar /privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar
    

    3. Create configuration files.

    hadoop fs -mkdir -p /privacera/crypto/configs
    vi crypto.properties
    privacera.portal.base.url=http://$<PRIVACERA_SERVER>:6868
    privacera.portal.username=$<USER_NAME>
    privacera.portal.password=$<PASSWORD>
    
    #Mode of encryption/decryption rpc/native
    privacera.crypto.mode=native
    cp crypto.properties  crypto_default.properties
                               
  3. Upload configuration to DBFS.

    hadoop fs -put crypto.properties /privacera/crypto/configs/crypto.properties
    hadoop fs -put crypto_default.properties /privacera/crypto/configs/crypto_default.properties
    
  4. Install Crypto jar into Cluster.

    1. SSH to the cluster node.

    2. Do kinit for <user> (if Kerberos is enabled).

    3. Connect to beeline.

    4. If you have Kerberos-enabled cluster, use the following command to login to beeline:

      Update the values for <HIVE_SERVER2_HOSTNAME> and <REALM> as per your environment

      HIVE_SERVER2_HOSTNAME=<<10.1.1.2>>REALM=<<EXAMPLE.PRIVACERA.US>>beeline -u "jdbc:hive2://$<HIVE_SERVER2_HOSTNAME>:10000/default;principal=hive/$<HIVE_SERVER2_HOSTNAME>@$<REALM>"
      
  5. Add Privacera crypto UDF jar. You need to run multiple SQL queries in Databricks cluster to create privacera encryption functions.

    • SQL query to create the Privacera unprotect UDF.

      add jar hdfs:///privacera/crypto//privacera-crypto-jar-with-dependencies-shaded.jar;
      
  6. Create the Privacera encryption UDF. (User defined function) You need to run multiple SQL queries in beeline to create Privacera encryption UDF.

    • SQL query to create Privacera unprotect udf.

      create database if not exists privacera ;
      drop functionif exists privacera.unprotect;
      CREATE FUNCTION privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar';
    • SQL query to create Privacera protect UDF.

      drop functionif exists privacera.protect;
      CREATE FUNCTION privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar';
Hive TEZ: Add Properties File in beeline

If you are using Hive TEZ, before executing the UDFs, you must add the cryptop.properties file in beeline wit hthe following command:

add file hdfs:///privacera/crypto/configs/crypto.properties;
Sample Queries to Verify Setup
  • Sample query to run encryption:

    select privacera.protect("test_data","$<SCHEME_NAME>") limit 10;
  • Sample query to run encryption and decryption in one query to verify setup:

    select privacera.unprotect(privacera.protect("test_data","$<SCHEME_NAME>"),"$<SCHEME_NAME>") limit 10;
  • For authorization and leveraging Hive Masking policies, you need to install the Hive plug-in in hiveserver2.

    encryption.jpg

If you do not want to install the Hive plug-in, you can authorize use of the keys based on KMS.

  • Create a view on top of your raw table:

    create view secure_view as select col1, privacera.protect(col2, ‘SSN’) as col2 from db.table;
    select * from secure_view;
                               
  • If the user is not present in Privacera, they can still access the protect function. It is recommended to use the Hive plug-in as they can control the access to the resource using Ranger Policies and it makes it easier to manage them with the simple UI.