Databricks Spark Object-level Access Control Plugin [OLAC] [Scala]#
Prerequisites#
Ensure the following prerequisites are met:
-
Dataserver should be installed and confirmed working:
-
For AWS, configure AWS S3 Dataserver
-
For Azure, configure Azure Dataserver
-
-
Configure Databricks Spark Plugin - Configuration.
Configuration#
-
Run the following commands.
cd ~/privacera/privacera-manager/ cp config/sample-vars/vars.databricks.scala.yml config/custom-vars/ vi config/custom-vars/vars.databricks.scala.yml
-
Edit the following properties. For property details and description, refer to the Configuration Properties below.
DATASERVER_DATABRICKS_ALLOWED_URLS : "<PLEASE_UPDATE>" DATASERVER_AWS_STS_ROLE: "<PLEASE_CHANGE>"
-
Run the following commands.
cd ~/privacera/privacera-manager ./privacera-manager.sh update
Configuration Properties#
Property | Description | Example |
---|---|---|
DATABRICKS_SCALA_ENABLE |
Set the property to enable/disable Databricks Scala. This is found under Databricks Signed URL Configuration For Scala Clusters section. |
|
DATASERVER_DATABRICKS_ALLOWED_URLS |
Add a URL or comma-separated URLs. Privacera Dataserver serves only those URLs mentioned in this property. |
https://xxx-7xxxfaxx-xxxx.cloud.databricks.com |
DATASERVER_AWS_STS_ROLE |
Add the instance profile ARN of the AWS role, which can access Delta Files in Databricks. |
arn:aws:iam::111111111111:role/assume-role |
DATABRICKS_MANAGE_INIT_SCRIPT |
Set the init script. If enabled, Privacera Manager will upload Init script ('ranger_enable.sh') to the identified Databricks Host. |
|
DATABRICKS_SCALA_CLUSTER_POLICY_SPARK_CONF |
Configure Databricks Cluster policy. Add the following JSON in the text area:
|
Managing init Script#
If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "true", the Init script will be uploaded automatically to your Databricks host. The Init Script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh
, where <DEPLOYMENT_ENV_NAME>
is the value of DEPLOYMENT_ENV_NAME
mentioned in vars.privacera.yml
.
If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "false" the Init script must be uploaded to your Databricks host.
-
Open a terminal and connect to Databricks account using your Databricks login credentials/token.
-
Connect using login credentials:
-
If you're using login credentials, then run the following command.
databricks configure --profile privacera
-
Enter the Databricks URL.
Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
-
Enter the username and password.
Username: email-id@yourdomain.com Password:
-
-
Connect using Databricks token:
-
If you don't have a Databricks token, you can generate one. For more information, refer Generate a personal access token.
-
If you're using token, then run the following command.
databricks configure --token --profile privacera
-
Enter the Databricks URL.
Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
-
Enter the token.
Token:
-
-
-
To check if the connection to your Databricks account is established, run the following command.
dbfs ls dbfs:/ --profile privacera
You should see the list of files in the output, if you are connected to your account.
-
Upload files manually to Databricks.
-
Copy the following files to DBFS, which are available in the PM host at the location,
~/privacera/privacera-manager/output/databricks
:- ranger_enable_scala.sh
- privacera_spark_scala_plugin.conf
- privacera_spark_scala_plugin_job.conf
-
Run the following command. For the value of
<DEPLOYMENT_ENV_NAME>
, you can get it from the file,~/privacera/privacera-manager/config/vars.privacera.yml
.export DEPLOYMENT_ENV_NAME=<DEPLOYMENT_ENV_NAME> dbfs mkdirs dbfs:/privacera/${DEPLOYMENT_ENV_NAME} --profile privacera dbfs cp ranger_enable_scala.sh dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera dbfs cp privacera_spark_scala_plugin.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera dbfs cp privacera_spark_scala_plugin_job.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
-
Verify the files have been uploaded.
dbfs ls dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
The Init Script will be uploaded to
dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh
, where<DEPLOYMENT_ENV_NAME>
is the value ofDEPLOYMENT_ENV_NAME
mentioned invars.privacera.yml
.
-
Configure Databricks Cluster#
-
Once the update completes successfully, log on to the Databricks console with your account and open the target cluster, or create a new target cluster.
-
Open the Cluster dialog. enter Edit mode.
-
In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then the Spark tab.
-
Add the following content to the Spark Config edit box. For more information on the Spark config properties, click here.
spark.databricks.isv.product privacera spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar spark.executor.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar spark.databricks.repl.allowedLanguages sql,python,r,scala spark.databricks.delta.formatCheck.enabled false
spark.databricks.cluster.profile serverless spark.databricks.delta.formatCheck.enabled false spark.driver.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar spark.executor.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar spark.databricks.isv.product privacera spark.databricks.repl.allowedLanguages sql,python,r,scala
Note
-
From Privacera 5.0.6.1 Release onwards, it is recommended to replace the Old Properties with the New Properties. However, the Old Properties will also continue to work.
-
For Databricks versions < 7.3, Old Properties should only be used since the versions are in extended support.
-
-
(Optional) To use regional endpoint for S3 access, add the following content to the Spark Config edit box.
spark.hadoop.fs.s3a.endpoint https://s3.<region>.amazonaws.com spark.hadoop.fs.s3.endpoint https://s3.<region>.amazonaws.com spark.hadoop.fs.s3n.endpoint https://s3.<region>.amazonaws.com
-
In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then set init script path. For the
<DEPLOYMENT_ENV_NAME>
variable, enter the deployment name as defined for theDEPLOYMENT_ENV_NAME
variable in thevars.privacera.yml
.dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh
-
Save (Confirm) this configuration.
-
Start (or Restart) the selected Databricks Cluster.
Related Information
For further reading, see:
-
If you want to enable JWT-based user authentication for your Databricks clusters, see JWT for Databricks.
-
If you want PM to add cluster policies in Databricks, see Configure Databricks Cluster Policy.
-
If you want to add additional Spark properties for your Databricks cluster, see Spark Properties for Databricks Cluster.