Skip to content

Basic Setup with Databricks

This section describes how to install and configure Privacera encryption in Databricks to create policies for users and groups.

Prerequisites#

Before enabling encryption for Databricks, make sure you have enabled Databricks itself in Privacera Manager:

  • Databricks Spark Plugin (Python/SQL) on AWS, Azure, or GCP.
  • Custom properties for encryption detailed in Crypto.

Methods for Installing Encryption jar#

You can install the Privacera encryption jar file in the following ways:

After you install the jar file, you need to define some configuration properties and User-Defined Functions (UDFs) to call the Privacera encryption /protect and /unprotect API requests.

Install Encryption jar via Databricks CLI#

  1. Download the jar to a local machine.

    The variable PRIVACERA_BASE_DOWNLOAD_URL depends on the version of the Privacera software you want. See Configure and Install Core Services.

    export PRIVACERA_BASE_DOWNLOAD_URL=${PRIVACERA_BASE_DOWNLOAD_URL}
    wget ${PRIVACERA_BASE_DOWNLOAD_URL}/privacera-crypto-jar-with-dependencies.jar -O privacera-crypto-jar-with-dependencies.jar
    
  2. Upload the jar file to DBFS or an S3 location from where the Databricks cluster can access it.

  3. With the Databricks CLI, upload the jar into DBFS:  

    databricks fs ls
    databricks fs mkdirs dbfs:/privacera/crypto/jars
    databricks fs cp privacera-crypto-jar-with-dependencies.jar dbfs:/privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar
    

Install Encryption jar via Databricks UI#

  1. Go to the Databricks cluster details page: Clusters > cluster name > Libraries.

  2. Click Install > New.

  3. Drop or upload the jar file.

    dbfs:/privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar

    Wait until the jar file is installed.

Create and Upload Encryption Configuration Files#

Note

The steps here rely on the default location of the Privacera crypto properties file. However, you can change this location to a directory of your choice. Follow the steps here and then see Custom Path to Crypto Properties File in Databricks.

  1. Create the configuration file.

    mkdir -p privacera/crypto/configs
    cd privacera/crypto/configs
     # Edit the crypto_default.properties file to set the following variables. 
    vi crypto_default.properties
    privacera.portal.base.url=http://<APP_HOSTNAME.>:6868 
    privacera.portal.username=<SOME_USERNAME>
    privacera.portal.password=<SOME_PASSWORD>
     # Mode of encryption/decryption: rpc or native
    privacera.crypto.mode=rpc
    
  2. Upload the configuration file to DBFS.

    databricks fs ls
    databricks fs mkdirs dbfs:/privacera/crypto/configs
    databricks fs cp crypto_default.properties dbfs:/privacera/crypto/configs/crypto_default.properties
    

Create Encryption UDFs#

Create Privacera encryption UDFs (User-Defined Functions) by running SQL queries in the Databricks cluster:

  • SQL query to create Privacera protect UDF:  
use privacera;
drop function if exists privacera.protect;
CREATE FUNCTION privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF';
  • SQL query to create privacera unprotect UDF.
create database if not exists privacera;
drop function if exists privacera.unprotect;
CREATE FUNCTION privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF'; 

Run Sample Queries To Verify#

Sample query to run encryption:

select privacera.protect(${colname},'${SCHEME_NAME}') from ${db_name}.${table_name} limit10;

Sample query to run encryption and decryption in a single query to verify the setup:

select privacera.unprotect(privacera.protect(${colname},'${SCHEME_NAME}'),'${SCHEME_NAME}') from ${db_name}.${table_name} limit10;

Last update: July 29, 2021