- Platform Release 6.5
- Privacera Platform Installation
- About Privacera Manager (PM)
- Install overview
- Prerequisites
- Installation
- Default services configuration
- Component services configurations
- Access Management
- Data Server
- PolicySync
- Snowflake
- Redshift
- Redshift Spectrum
- PostgreSQL
- Microsoft SQL Server
- Databricks SQL
- RocksDB
- Google BigQuery
- Power BI
- UserSync
- Privacera Plugin
- Databricks
- Spark standalone
- Spark on EKS
- Trino Open Source
- Dremio
- AWS EMR
- AWS EMR with Native Apache Ranger
- GCP Dataproc
- Starburst Enterprise
- Privacera services (Data Assets)
- Audit Fluentd
- Grafana
- Access Request Manager (ARM)
- Ranger Tagsync
- Discovery
- Encryption & Masking
- Privacera Encryption Gateway (PEG) and Cryptography with Ranger KMS
- AWS S3 bucket encryption
- Ranger KMS
- AuthZ / AuthN
- Security
- Access Management
- Reference - Custom Properties
- Validation
- Additional Privacera Manager configurations
- CLI actions
- Debugging and logging
- Advanced service configuration
- Increase Privacera portal timeout for large requests
- Order of precedence in PolicySync filter
- Configure system properties
- PolicySync
- Databricks
- Table properties
- Upgrade Privacera Manager
- Troubleshooting
- Possible Errors and Solutions in Privacera Manager
-
- Unable to Connect to Docker
- Terminate Installation
- 6.5 Platform Installation fails with invalid apiVersion
- Ansible Kubernetes Module does not load
- Unable to connect to Kubernetes Cluster
- Common Errors/Warnings in YAML Config Files
- Delete old unused Privacera Docker images
- Unable to debug error for an Ansible task
- Unable to upgrade from 4.x to 5.x or 6.x due to Zookeeper snapshot issue
- Storage issue in Privacera UserSync & PolicySync
- Permission Denied Errors in PM Docker Installation
- Unable to initialize the Discovery Kubernetes pod
- Portal service
- Grafana service
- Audit server
- Audit Fluentd
- Privacera Plugin
-
- Possible Errors and Solutions in Privacera Manager
- How-to
- Appendix
- AWS topics
- AWS CLI
- AWS IAM
- Configure S3 for real-time scanning
- Install Docker and Docker compose (AWS-Linux-RHEL)
- AWS S3 MinIO quick setup
- Cross account IAM role for Databricks
- Integrate Privacera services in separate VPC
- Securely access S3 buckets ssing IAM roles
- Multiple AWS account support in Dataserver using Databricks
- Multiple AWS S3 IAM role support in Dataserver
- Azure topics
- GCP topics
- Kubernetes
- Microsoft SQL topics
- Snowflake configuration for PolicySync
- Create Azure resources
- Databricks
- Spark Plug-in
- Azure key vault
- Add custom properties
- Migrate Ranger KMS master key
- IAM policy for AWS controller
- Customize topic and table names
- Configure SSL for Privacera
- Configure Real-time scan across projects in GCP
- Upload custom SSL certificates
- Deployment size
- Service-level system properties
- PrestoSQL standalone installation
- AWS topics
- Privacera Platform User Guide
- Introduction to Privacera Platform
- Settings
- Data inventory
- Token generator
- System configuration
- Diagnostics
- Notifications
- How-to
- Privacera Discovery User Guide
- What is Discovery?
- Discovery Dashboard
- Scan Techniques
- Processing order of scan techniques
- Add and scan resources in a data source
- Start or cancel a scan
- Tags
- Dictionaries
- Patterns
- Scan status
- Data zone movement
- Models
- Disallowed Tags Policy
- Rules
- Types of rules
- Example rules and classifications
- Create a structured rule
- Create an unstructured rule
- Create a rule mapping
- Export rules and mappings
- Import rules and mappings
- Post-processing in real-time and offline scans
- Enable post-processing
- Example of post-processing rules on tags
- List of structured rules
- Supported scan file formats
- Data Source Scanning
- Data Inventory
- TagSync using Apache Ranger
- Compliance Workflow
- Data zones and workflow policies
- Workflow Policies
- Alerts Dashboard
- Data Zone Dashboard
- Data zone movement
- Example Workflow Usage
- Discovery health check
- Reports
- Built-in Reports
- Saved reports
- Offline reports
- Reports with the query builder
- How-to
- Privacera Encryption Guide
- Essential Privacera Encryption terminology
- Install Privacera Encryption
- Encryption Key Management
- Schemes
- Scheme Policies
- Encryption Schemes
- Presentation Schemes
- Masking schemes
- Encryption formats, algorithms, and scopes
- Deprecated encryption formats, algorithms, and scopes
- Encryption with PEG REST API
- PEG REST API on Privacera Platform
- PEG API Endpoint
- Encryption Endpoint Summary for Privacera Platform
- Authentication Methods on Privacera Platform
- Anatomy of the /protect API Endpoint on Privacera Platform
- About Constructing the datalist for protect
- About Deconstructing the datalist for unprotect
- Example of Data Transformation with /unprotect and Presentation Scheme
- Example PEG API endpoints
- /unprotect with masking scheme
- REST API Response Partial Success on Bulk Operations
- Audit Details for PEG REST API Accesses
- REST API Reference
- Make calls on behalf of another user
- Troubleshoot REST API Issues on Privacera Platform
- PEG REST API on Privacera Platform
- Encryption with Databricks, Hive, Streamsets, Trino
- Databricks UDFs for encryption and masking
- Hive UDFs
- Streamsets
- Trino UDFs
- Privacera Access Management User Guide
- Privacera Access Management
- How Polices are evaluated
- Resource policies
- Policies overview
- Creating Resource Based Policies
- Configure Policy with Attribute-Based Access Control
- Configuring Policy with Conditional Masking
- Tag Policies
- Entitlement
- Request Access
- Approve access requests
- Service Explorer
- User/Groups/Roles
- Permissions
- Reports
- Audit
- Security Zone
- Access Control using APIs
- AWS User Guide
- Overview of Privacera on AWS
- Set policies for AWS services
- Using Athena with data access server
- Using DynamoDB with data access server
- Databricks access manager policy
- Accessing Kinesis with data access server
- Accessing Firehose with Data Access Server
- EMR user guide
- AWS S3 bucket encryption
- S3 browser
- Getting started with Minio
- Plugins
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- Privacera documentation changelog
Databricks
Privacera Plugin in Databricks
Databricks
Privacera provides two types of plugin solutions for access control in Databricks clusters. Both plugins are mutually exclusive and cannot be enabled on the same cluster.
Databricks Spark Fine-Grained Access Control (FGAC) Plugin
Recommended for SQL, Python, R language notebooks.
Provides FGAC on databases with row filtering and column masking features.
Uses privacera_hive, privacera_s3, privacera_adls, privacera_files services for resource-based access control, and privacera_tag service for tag-based access control.
Uses the plugin implementation from Privacera.
Databricks Spark Object Level Access Control (OLAC) Plugin
OLAC plugin was introduced to provide an alternative solution for Scala language clusters, since using Scala language on Databricks Spark has some security concerns.
Recommended for Scala language notebooks.
Provides OLAC on S3 locations which you are trying to access via Spark.
Uses privacera_s3 service for resource-based access control and privacera_tag service for tag-based access control.
Uses the signed-authorization implementation from Privacera.
Databricks cluster deployment matrix with Privacera plugin:
Job/Workflow use-case for automated cluster:
Run-Now will create the new cluster based on the definition mentioned in the job description.
Job Type | Languages | FGAC/DBX version | OLAC/DBX Version |
---|---|---|---|
Notebook | Python/R/SQL | Supported [7.3, 9.1 , 10.4] | |
JAR | Java/Scala | Not supported | Supported[7.3, 9.1 , 10.4] |
spark-submit | Java/Scala/Python | Not supported | Supported[7.3, 9.1 , 10.4] |
Python | Python | Supported [7.3, 9.1 , 10.4] | |
Python wheel | Python | Supported [9.1 , 10.4] | |
Delta Live Tables pipeline | Not supported | Not supported |
Job on existing cluster:
Run-Now will use the existing cluster which is mentioned in the job description.
Job Type | Languages | FGAC/DBX version | OLAC |
---|---|---|---|
Notebook | Python/R/SQL | supported [7.3, 9.1 , 10.4] | Not supported |
JAR | Java/Scala | Not supported | Not supported |
spark-submit | Java/Scala/Python | Not supported | Not supported |
Python | Python | Not supported | Not supported |
Python wheel | Python | supported [9.1 , 10.4] | Not supported |
Delta Live Tables pipeline | Not supported | Not supported |
Interactive use-case
Interactive use-case is running a notebook of SQL/Python on an interactive cluster.
Cluster Type | Languages | FGAC | OLAC |
---|---|---|---|
Standard clusters | Scala/Python/R/SQL | Not supported | Supported [7.3,9.1,10.4] |
High Concurrency clusters | Python/R/SQL | Supported [7.3,9.1,10.4 | Supported [7.3,9.1,10.4] |
Single Node | Scala/Python/R/SQL | Not supported | Supported [7.3,9.1,10.4] |
Databricks Spark Fine-Grained Access Control Plugin [FGAC] [Python, SQL]
Configuration
Run the following commands:
cd ~/privacera/privacera-manager cp config/sample-vars/vars.databricks.plugin.yml config/custom-vars/ vi config/custom-vars/vars.databricks.plugin.yml
Edit the following properties to allow Privacera Platform to connect to your Databricks host. For property details and description, refer to the Configuration Properties below.
DATABRICKS_HOST_URL: "<PLEASE_UPDATE>" DATABRICKS_TOKEN: "<PLEASE_UPDATE>" DATABRICKS_WORKSPACES_LIST: - alias: DEFAULT databricks_host_url: "{{DATABRICKS_HOST_URL}}" token: "{{DATABRICKS_TOKEN}}" DATABRICKS_MANAGE_INIT_SCRIPT: "true" DATABRICKS_ENABLE: "true"
Note
You can also add custom properties that are not included by default. See Databricks.
Run the following commands:
cd ~/privacera/privacera-manager ./privacera-manager.sh update
(Optional) By default, policies under the default service name, privacera_hive, are enforced. You can customize a different service name and enforce policies defined in the new name. See Configure Service Name for Databricks Spark Plugin.
Configuration properties
Property Name | Description | Example Values |
---|---|---|
| Enter the URL where the Databricks environment is hosted. | For AZURE Databricks, DATABRICKS_HOST_URL: "https://xdx-66506xxxxxxxx.2.azuredatabricks.net/?o=665066931xxxxxxx" For AWS Databricks DATABRICKS_HOST_URL: "https://xxx-7xxxfaxx-xxxx.cloud.databricks.com" |
| Enter the token. To generate the token, 1. Login to your Databricks account. 2. Click the user profile icon in the upper right corner of your Databricks workspace. 3. Click User Settings. 4. Click the Generate New Token button. 5. Optionally enter a description (comment) and expiration period. 6. Click the Generate button. 7. Copy the generated token. | DATABRICKS_TOKEN: "xapid40xxxf65xxxxxxe1470eayyyyycdc06" |
| Add multiple Databricks workspaces to connect to Ranger.
Note: | |
| If set to 'true' Privacera Manager will create the Databricks cluster Init script "ranger_enable.sh" to: '~/privacera/privacera-manager/output/databricks/ranger_enable.sh. | "true" "false" |
| If set to 'true' Privacera Manager will upload Init script ('ranger_enable.sh') to the identified Databricks Host. If set to 'false' upload the following two files to the DBFS location. The files can be located at *~/privacera/privacera-manager/output/databricks*.
| "true" "false" |
| Use the Java agent to assign a string of extra JVM options to pass to the Spark driver. | -javaagent:/databricks/jars/privacera-agent.jar |
| Property to map logged-in user to Ranger user for row-filter policy. It is mapped with the Databricks cluster-level property | current_user() |
| Property to enable masking, row-filter and data_admin access on view. Property to enable masking, row-filter and data_admin access on view. This property is a Privacera Manager (PM) property It is mapped with the Databricks cluster-level property | false |
| Configure Databricks Cluster policy. Add the following JSON in the text area: [{"Note":"First spark conf","key":"spark.hadoop.first.spark.test","value":"test1"},{"Note":"Second spark conf","key":"spark.hadoop.first.spark.test","value":"test2"}] | |
| NoteThis property is not part of the default YAML file, but can be added if required. Use this property, if you want to run a specific set of commands in the Databricks init script. | The following example will be added to the cluster init script to allow Athena JDBC via data access server. DATABRICKS_POST_PLUGIN_COMMAND_LIST: - sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8181 -j ACCEPT - sudo curl -k -u user:password {{PORTAL_URL}}/api/dataserver/cert?type=dataserver_jks -o /etc/ssl/certs/dataserver.jks - sudo chmod 755 /etc/ssl/certs/dataserver.jks |
| This property allows you to backlist APIs to enable security. This property is a Privacera Manager (PM) property It is mapped with the Databricks cluster-level property | The following example will be added to the cluster init script to allow Athena JDBC via data access server. DATABRICKS_POST_PLUGIN_COMMAND_LIST: - sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8181 -j ACCEPT - sudo curl -k -u user:password {{PORTAL_URL}}/api/dataserver/cert?type=dataserver_jks -o /etc/ssl/certs/dataserver.jks - sudo chmod 755 /etc/ssl/certs/dataserver.jks |
Managing init script
Automatic upload
If DATABRICKS_ENABLE
is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT
is 'true', then the Init script will be uploaded automatically to your Databricks host. The init script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh
where <DEPLOYMENT_ENV_NAME>
is the value of DEPLOYMENT_ENV_NAME
mentioned in vars.privacera.yml
.
Manual upload
If DATABRICKS_ENABLE
is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT
is 'false', then the Init script must be uploaded to your Databricks host.
Note
To avoid the manual steps below, you should set DATABRICKS_MANAGE_INIT_SCRIPT=true
and follow the instructions outlined in Automatic Upload.
Open a terminal and connect to Databricks account using your Databricks login credentials/token.
Connect using login credentials:
If you're using login credentials, then run the following command:
databricks configure --profile privacera
Enter the Databricks URL:
Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
Enter the username and password: Username: email-id@example.com Password:
Connect using Databricks token:
If you don't have a Databricks token, you can generate one. For more information, refer Generate a personal access token.
If you're using token, then run the following command:
databricks configure --token --profile privacera
Enter the Databricks URL:
Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
Enter the token:
Token:
To check if the connection to your Databricks account is established, run the following command:
dbfs ls dbfs:/ --profile privacera
You should see the list of files in the output, if you are connected to your account.
Upload files manually to Databricks:
Copy the following files to DBFS, which are available in the PM host at the location,
~/privacera/privacera-manager/output/databricks
:ranger_enable.sh
privacera_spark_plugin.conf
privacera_spark_plugin_job.conf
privacera_custom_conf.zip
Run the following command. For the value of
<DEPLOYMENT_ENV_NAME>
, you can get it from the file,~/privacera/privacera-manager/config/vars.privacera.yml
.export DEPLOYMENT_ENV_NAME=<DEPLOYMENT_ENV_NAME> dbfs mkdirs dbfs:/privacera/${DEPLOYMENT_ENV_NAME} --profile privacera dbfs cp ranger_enable.sh dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera dbfs cp privacera_spark_plugin.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera dbfs cp privacera_spark_plugin_job.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera dbfs cp privacera_custom_conf.zip dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
Verify the files have been uploaded.
dbfs ls dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
The Init Script will be uploaded to
dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh
, where<DEPLOYMENT_ENV_NAME>
is the value ofDEPLOYMENT_ENV_NAME
mentioned invars.privacera.yml
.
Configure Databricks Cluster
Once the update completes successfully, log on to the Databricks console with your account and open the target cluster, or create a new target cluster.
Open the Cluster dialog and enter Edit mode.
In the Configuration tab, select Advanced Options > Spark.
Add the following content to the Spark Config edit box. For more information on the Spark config properties, click here.
New Properties
spark.databricks.cluster.profile serverless spark.databricks.isv.product privacera spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar spark.databricks.repl.allowedLanguages sql,python,r
Old Properties
spark.databricks.cluster.profile serverless spark.databricks.repl.allowedLanguages sql,python,r spark.driver.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar spark.databricks.isv.product privacera spark.databricks.pyspark.enableProcessIsolation true
Note
From Privacera 5.0.6.1 Release onwards, it is recommended to replace the Old Properties with the New Properties. However, the Old Properties will also continue to work.
For Databricks versions < 7.3, Old Properties should only be used since the versions are in extended support.
In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then set init script path. For the
<DEPLOYMENT_ENV_NAME>
variable, enter the deployment name as defined for theDEPLOYMENT_ENV_NAME
variable in thevars.privacera.yml
.dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh
In the Table Access Control section, uncheck Enable table access control and only allow Python and SQL commands and Enable credential passthrough for user-level data access and only allow Python and SQL commands checkboxes.
Save (Confirm) this configuration.
Start (or Restart) the selected Databricks Cluster.
Related Information
For further reading, see:
To enable view-level access control (via Data-Admin), and view-level row-level filtering and column masking, add the property
DATABRICKS_SPARK_PRIVACERA_VIEW_LEVEL_MASKING_ROWFILTER_EXTENSION_ENABLE: "true"
in custom-vars. Search for this property in Spark Plugin Properties for more information. To learn how to use the property, see Apply View-level Access Control.By default, certain python packages are blocked on the Databricks cluster for security compliance. If you still wish to use these packages, see Whitelisting Py4j Packages.
If you want to enable JWT-based user authentication for your Databricks clusters, see JWT for Databricks.
If you want PM to add cluster policies in Databricks, see Configure Databricks Cluster Policy.
If you want to add additional Spark properties for your Databricks cluster, see Spark Properties for Databricks Cluster.
Validation
In order to help evaluate the use of Privacera with Databricks, Privacera provides a set of Privacera Manager 'demo' notebooks. These can be downloaded from Privacera S3 repository using either your favorite browser, or a command line 'wget'. Use the notebook/sql sequence that matches your cluster.
Download using your browser (just click on the correct file for your cluster, below:
https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPlugin.sql
If AWS S3 is configured from your Databricks cluster: https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginS3.sql
If ADLS Gen2 is configured from your Databricks cluster: https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginADLS.sql
or, if you are working from a Linux command line, use the 'wget' command to download.
wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPlugin.sql -O PrivaceraSparkPlugin.sql
wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginS3.sql -O PrivaceraSparkPluginS3.sql
wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginADLS.sql -O PrivaceraSparkPluginADLS.sql
Import the Databricks notebook:
Log in to the Databricks Console
Select Workspace > Users > Your User.
From the drop down menu, select Import and choose the file downloaded.
Follow the suggested steps in the text of the notebook to exercise and validate Privacera with Databricks.
Databricks Spark Object-level Access Control Plugin [OLAC] [Scala]
Prerequisites
Ensure the following prerequisites are met:
Dataserver should be installed and confirmed working:
For AWS, configure AWS S3 Dataserver
For Azure, configure Azure Dataserver
Configure Databricks Spark Plugin - Configuration.
Configuration
Run the following commands.
cd ~/privacera/privacera-manager/ cp config/sample-vars/vars.databricks.scala.yml config/custom-vars/ vi config/custom-vars/vars.databricks.scala.yml
Edit the following properties. For property details and description, refer to the Configuration Properties below.
DATASERVER_DATABRICKS_ALLOWED_URLS : "<PLEASE_UPDATE>" DATASERVER_AWS_STS_ROLE: "<PLEASE_CHANGE>"
Run the following commands.
cd ~/privacera/privacera-manager ./privacera-manager.sh update
Configuration properties
Property | Description | Example |
---|---|---|
| Set the property to enable/disable Databricks Scala. This is found under Databricks Signed URL Configuration For Scala Clusters section. | |
| Add a URL or comma-separated URLs. Privacera Dataserver serves only those URLs mentioned in this property. | https://xxx-7xxxfaxx-xxxx.cloud.databricks.com |
| Add the instance profile ARN of the AWS role, which can access Delta Files in Databricks. | arn:aws:iam::111111111111:role/assume-role |
| Set the init script. If enabled, Privacera Manager will upload Init script ('ranger_enable.sh') to the identified Databricks Host. If disabled, Privacera Manager will take no action regarding the Init script for the Databricks File System. | |
| Configure Databricks Cluster policy. Add the following JSON in the text area: [{"Note":"First spark conf", "key":"spark.hadoop.first.spark.test", "value":"test1"}, {"Note":"Second spark conf", "key":"spark.hadoop.first.spark.test", "value":"test2"}] |
Managing init script
If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "true", the Init script will be uploaded automatically to your Databricks host. The Init Script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh
, where <DEPLOYMENT_ENV_NAME>
is the value of DEPLOYMENT_ENV_NAME
mentioned in vars.privacera.yml
.
If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "false" the Init script must be uploaded to your Databricks host.
Open a terminal and connect to Databricks account using your Databricks login credentials/token.
Connect using login credentials:
If you're using login credentials, then run the following command.
databricks configure --profile privacera
Enter the Databricks URL.
Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
Enter the username and password.
Username: email-id@yourdomain.com Password:
Connect using Databricks token:
If you don't have a Databricks token, you can generate one. For more information, refer Generate a personal access token.
If you're using token, then run the following command.
databricks configure --token --profile privacera
Enter the Databricks URL.
Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
Enter the token.
Token:
To check if the connection to your Databricks account is established, run the following command.
dbfs ls dbfs:/ --profile privacera
You should see the list of files in the output, if you are connected to your account.
Upload files manually to Databricks.
Copy the following files to DBFS, which are available in the PM host at the location,
~/privacera/privacera-manager/output/databricks
:ranger_enable_scala.sh
privacera_spark_scala_plugin.conf
privacera_spark_scala_plugin_job.conf
Run the following command. For the value of
<DEPLOYMENT_ENV_NAME>
, you can get it from the file,~/privacera/privacera-manager/config/vars.privacera.yml
.export DEPLOYMENT_ENV_NAME=<DEPLOYMENT_ENV_NAME> dbfs mkdirs dbfs:/privacera/${DEPLOYMENT_ENV_NAME} --profile privacera dbfs cp ranger_enable_scala.sh dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera dbfs cp privacera_spark_scala_plugin.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera dbfs cp privacera_spark_scala_plugin_job.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
Verify the files have been uploaded.
dbfs ls dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
The Init Script is uploaded to
dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh
, where<DEPLOYMENT_ENV_NAME>
is the value ofDEPLOYMENT_ENV_NAME
mentioned invars.privacera.yml
.
Configure Databricks cluster
Once the update completes successfully, log on to the Databricks console with your account and open the target cluster, or create a new target cluster.
Open the Cluster dialog. enter Edit mode.
In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then the Spark tab.
Add the following content to the Spark Config edit box. For more information on the Spark config properties, click here.
New Properties
spark.databricks.isv.product privacera spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar spark.executor.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar spark.databricks.repl.allowedLanguages sql,python,r,scala spark.databricks.delta.formatCheck.enabled false
Old Properties
spark.databricks.cluster.profile serverless spark.databricks.delta.formatCheck.enabled false spark.driver.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar spark.executor.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar spark.databricks.isv.product privaceraspark.databricks.repl.allowedLanguages sql,python,r,scala
Note
From Privacera 5.0.6.1 Release onwards, it is recommended to replace the Old Properties with the New Properties. However, the Old Properties will also continue to work.
For Databricks versions < 7.3, Old Properties should only be used since the versions are in extended support.
(Optional) To use regional endpoint for S3 access, add the following content to the Spark Config edit box.
spark.hadoop.fs.s3a.endpoint https://s3.<region>.amazonaws.com spark.hadoop.fs.s3.endpoint https://s3.<region>.amazonaws.com spark.hadoop.fs.s3n.endpoint https://s3.<region>.amazonaws.com
In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then set init script path. For the
<DEPLOYMENT_ENV_NAME>
variable, enter the deployment name as defined for theDEPLOYMENT_ENV_NAME
variable in thevars.privacera.yml
.dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh
Save (Confirm) this configuration.
Start (or Restart) the selected Databricks Cluster.
Related information
For further reading, see:
If you want to enable JWT-based user authentication for your Databricks clusters, see JWT for Databricks.
If you want PM to add cluster policies in Databricks, see Configure Databricks Cluster Policy.
If you want to add additional Spark properties for your Databricks cluster, see Spark Properties for Databricks Cluster.