Skip to content

Discovery on Databricks#

This topic covers the installation of Privacera Discovery on Databricks.

Configuration

  1. SSH to the instance as USER.

  2. Configure the Databricks host and token.

  3. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.discovery.databricks.yml config/custom-vars/
    vi custom-vars/vars.discovery.databricks.yml
    
  4. Edit the following properties. For property details and description, click here.

    PRIVACERA_DISCOVERY_DATABRICKS_DOWNLOAD_URL: "<PLEASE_CHANGE>"
    DATABRICKS_DRIVER_INSTANCE_TYPE: "m5.xlarge"
    DATABRICKS_INSTANCE_TYPE: "m5.xlarge"
    DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT: "true"
    DATABRICKS_DISCOVERY_SPARK_VERSION: "7.3.x-scala2.12"
    DATABRICKS_DISCOVERY_INSTANCE_PROFILE: "arn:aws:iam::account-id:instance-profile/databricks_cluster_iam_role"
    DISCOVERY_AWS_CLOUD_ASSUME_ROLE: "true"
    
    PRIVACERA_DISCOVERY_DATABRICKS_DOWNLOAD_URL: "<PLEASE_CHANGE>"
    DATABRICKS_DRIVER_INSTANCE_TYPE: "Standard_DS3_v2"
    DATABRICKS_INSTANCE_TYPE: "Standard_DS3_v2"
    DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT: "true"
    DATABRICKS_DISCOVERY_SPARK_VERSION: "7.3.x-scala2.12"
    
  5. Privacera Discovery requires access to the target Databricks repository.

    Create a new configuration folder and file, and add Host and Token properties as shown below.

    mkdir -p ~/privacera/privacera-manager/credentials/databricks
    vi -p ~/privacera/privacera-manager/credentials/databricks/.databrickscfg
    

    Add the following properties to this file, and save. 

    [DEFAULT]
    host = <DATABRICKS_HOST_NAME>
    token = <USER_TOKEN>
    

Last update: August 9, 2021