Privacera Platform master publication

Discovery on Databricks
:
Discovery on Databricks

This topic covers the installation of Privacera Discovery on Databricks.

Configuration
  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.discovery.databricks.yml config/custom-vars/
    vi custom-vars/vars.discovery.databricks.yml
    
  3. Add and provide the following details in custom-vars/vars.discovery.databricks.yml file if the Databricks plugin is not enabled. To configure Databricks plugin, see Configuration in Databricks Spark Fine-Grained Access Control Plugin (FGAC) (Python, SQL).

    DATABRICKS_HOST_URL: "<PLEASE_UPDATE>"
    DATABRICKS_TOKEN: "<PLEASE_UPDATE>"
    
    DATABRICKS_WORKSPACES_LIST:
    - alias: DEFAULT
        databricks_host_url: "{{DATABRICKS_HOST_URL}}"
        token: "{{DATABRICKS_TOKEN}}"
    
  4. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    AWS

    DATABRICKS_DRIVER_INSTANCE_TYPE: "m5.xlarge"
    DATABRICKS_INSTANCE_TYPE: "m5.xlarge"
    DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT: "true"
    DATABRICKS_DISCOVERY_SPARK_VERSION: "7.3.x-scala2.12"
    DATABRICKS_DISCOVERY_INSTANCE_PROFILE: "arn:aws:iam::<ACCOUNT_ID>:instance-profile/<DATABRICKS_CLUSTER_IAM_ROLE>"
    DISCOVERY_AWS_CLOUD_ASSUME_ROLE: "true"
    DISCOVERY_AWS_CLOUD_ASSUME_ROLE_ARN: "arn:aws:iam::<ACCOUNT_ID>:role/<DISCOVERY_IAM_ROLE>"
    

    Azure

    >
    DATABRICKS_DRIVER_INSTANCE_TYPE: "Standard_DS3_v2"
    DATABRICKS_INSTANCE_TYPE: "Standard_DS3_v2"
    DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT: "true"
    DATABRICKS_DISCOVERY_SPARK_VERSION: "7.3.x-scala2.12"

Note

PRIVACERA_DISCOVERY_DATABRICKS_DOWNLOAD_URL is no longer in use. The Discovery Databricks packages will be downloaded from PRIVACERA_BASE_DOWNLOAD_URL.

Configuration properties

Property

Description

Example

DATABRICKS_DRIVER_INSTANCE_TYPE

For AWS driver's instance type can be "m5.xlarge" or "m5.2xlarge"

For Azure driver's instance type can be "Standard_DS3_v2"

m5.xlarge

DATABRICKS_INSTANCE_TYPE

For AWS driver's instance type can be "m5.xlarge" or "m5.2xlarge"

For Azure driver's instance type can be "Standard_DS3_v2"

m5.xlarge

SETUP_DATABRICKS_JAR

USE_DATABRICKS_SPARK

DATABRICKS_ELASTIC_DISK

DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT

Set to true if you want to create databricks init script.

false

DATABRICKS_DISCOVERY_WORKERS

DATABRICKS_DISCOVERY_JOB_NAME

DATABRICKS_DISCOVERY_SPARK_VERSION

Spark version can be as follows:

  • 6.4.x-scala2.11 (Spark 2.4)

  • 7.3.x-scala2.12 (Spark 3.0)

  • 7.4.x-scala2.12 (Spark 3.0)

  • 7.5.x-scala2.12 (Spark 3.0)

  • 7.6.x-scala2.12 (Spark 3.0)

7.3.x-scala2.12

DATABRICKS_DISCOVERY_INSTANCE_PROFILE

Property is used for the instance role, for the Databricks instance node where your discovery will be running

arn:aws:iam::1234564835:instance-profile/privacera_databricks_cluster_iam_role

DISCOVERY_AWS_CLOUD_ASSUME_ROLE

Property to grant Discovery access to AWS services to perform the scanning operation.

true

DISCOVERY_AWS_CLOUD_ASSUME_ROLE_ARN

ARN of the AWS IAM Role

arn:aws:iam::12345671758:role/DiscoveryCrossAccAssumeRole_k