Skip to content

Databricks Spark Fine-Grained Access Control Plugin [FGAC] [Python, SQL]#

Configuration#

  1. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.databricks.plugin.yml config/custom-vars/
    vi config/custom-vars/vars.databricks.plugin.yml
    
  2. Assign (and save) the following values to allow Privacera Platform to connect to your Databricks host. For property details and description, click here.

    DATABRICKS_HOST_URL: "<PLEASE_UPDATE>"
    DATABRICKS_TOKEN: "<PLEASE_UPDATE>"
    
    DATABRICKS_WORKSPACES_LIST:
    - alias: DEFAULT
        databricks_host_url: "{{DATABRICKS_HOST_URL}}"
        token: "{{DATABRICKS_TOKEN}}"
    
    DATABRICKS_MANAGE_INIT_SCRIPT: "true"
    DATABRICKS_ENABLE: "true"
    

    Note

    You can also add custom properties that are not included by default. See Databricks.

  3. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
  4. (Optional) By default, policies under the default service name, privacera_hive, are enforced. You can customize a different service name and enforce policies defined in the new name. See Configure Service Name for Databricks Spark Plugin.

Managing init Script#

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "true", the Init script will be uploaded automatically to your Databricks host. The Init Script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh, where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "false" the Init script must be uploaded to your Databricks host.

Note

To avoid the manual steps below, you should set DATABRICKS_MANAGE_INIT_SCRIPT=true and follow the instructions on the Automatic Upload tab above.

  1. Open a terminal and connect to Databricks account using your Databricks login credentials/token.

    • Connect using login credentials:

      1. If you're using login credentials, then run the following command.

        databricks configure --profile privacera
        
      2. Enter the Databricks URL.

        Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
        
      3. Enter the username and password.

        Username: email-id@yourdomain.com
        Password:
        
    • Connect using Databricks token:

      1. If you don't have a Databricks token, you can generate one. For more information, refer Generate a personal access token.

      2. If you're using token, then run the following command.

        databricks configure --token --profile privacera
        
      3. Enter the Databricks URL.

        Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
        
      4. Enter the token.

        Token:
        
  2. To check if the connection to your Databricks account is established, run the following command.

    dbfs ls dbfs:/ --profile privacera
    

    You should see the list of files in the output, if you are connected to your account.

  3. Upload files manually to Databricks.

    1. Copy the following files to DBFS, which are available in the PM host at the location, ~/privacera/privacera-manager/output/databricks:

      • ranger_enable.sh
      • privacera_spark_plugin.conf
      • privacera_spark_plugin_job.conf
      • privacera_custom_conf.zip
    2. Run the following command. For the value of <DEPLOYMENT_ENV_NAME>, you can get it from the file, ~/privacera/privacera-manager/config/vars.privacera.yml.

      export DEPLOYMENT_ENV_NAME=<DEPLOYMENT_ENV_NAME>
      dbfs mkdirs dbfs:/privacera/${DEPLOYMENT_ENV_NAME} --profile privacera
      dbfs cp ranger_enable.sh dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      dbfs cp privacera_spark_plugin.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      dbfs cp privacera_spark_plugin_job.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      dbfs cp privacera_custom_conf.zip dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera 
      
    3. Verify the files have been uploaded.

      dbfs ls dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      

      The Init Script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh, where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Configure Databricks Cluster#

  1. Once the update completes successfully, log on to the Databricks console with your account and open the target cluster, or create a new target cluster.

  2. Open the Cluster dialog. enter Edit mode.

  3. In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then the Spark tab.

  4. Add the following content to the Spark Config edit box. For more information on the Spark config properties, click here.

    spark.databricks.cluster.profile serverless
    spark.databricks.isv.product privacera
    spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar 
    spark.databricks.repl.allowedLanguages sql,python,r
    
    spark.databricks.cluster.profile serverless
    spark.databricks.repl.allowedLanguages sql,python,r 
    spark.driver.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar 
    spark.databricks.isv.product privacera
    spark.databricks.pyspark.enableProcessIsolation true
    

    Note

    • From Privacera 5.0.6.1 Release onwards, it is recommended to replace the Old Properties with the New Properties. However, the Old Properties will also continue to work.

    • For Databricks versions < 7.3, Old Properties should only be used since the versions are in extended support.

  5. In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then set init script path. For the <DEPLOYMENT_ENV_NAME> variable, enter the deployment name as defined for the DEPLOYMENT_ENV_NAME variable in the vars.privacera.yml.

     dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh
    
  6. Save (Confirm) this configuration.

  7. Start (or Restart) the selected Databricks Cluster.

Related Information

For further reading, see:

  • To enable view-level access control (via Data-Admin), and view-level row-level filtering and column masking, add the property DATABRICKS_SPARK_PRIVACERA_VIEW_LEVEL_MASKING_ROWFILTER_EXTENSION_ENABLE: "true" in custom-vars. Search for this property in Spark Plugin Properties for more information. To learn how to use the property, see Apply View-level Access Control.

  • By default, certain python packages are blocked on the Databricks cluster for security compliance. If you still wish to use these packages, see Whitelisting Py4j Packages.

  • If you want to enable JWT-based user authentication for your Databricks clusters, see JWT for Databricks.

  • If you want PM to add cluster policies in Databricks, see Configure Databricks Cluster Policy.

  • If you want to add additional Spark properties for your Databricks cluster, see Spark Properties for Databricks Cluster.

Validation#

In order to help evaluate the use of Privacera with Databricks, Privacera provides a set of Privacera Manager 'demo' notebooks.  These can be downloaded from Privacera S3 repository using either your favorite browser, or a command line 'wget'.  Use the notebook/sql sequence that matches your cluster.

  1. Download using your browser (just click on the correct file for your cluster, below:

    https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPlugin.sql

    If AWS S3 is configured from your Databricks cluster: https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginS3.sql

    If ADLS Gen2 is configured from your Databricks cluster: https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginADLS.sql

    or, if you are working from a Linux command line, use the 'wget' command to download.

    wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPlugin.sql -O PrivaceraSparkPlugin.sql

    wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginS3.sql -O PrivaceraSparkPluginS3.sql

    wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginADLS.sql -O PrivaceraSparkPluginADLS.sql

  2. Import the Databricks notebook:

    Login to Databricks Console ->
    Select Workspace -> Users -> Your User ->
    Click on drop down ->
    Click on Import and Choose the file downloaded

  3. Follow the suggested steps in the text of the notebook to exercise and validate Privacera with Databricks.


Last update: October 13, 2021