Databricks User Guide#
Spark Fine-grained Access Control (FGAC)#
Enable View-level Access Control#
-
Edit the SparkConfig of your existing Privacera-enabled Databricks Cluster.
-
Add the below property.
spark.hadoop.privacera.spark.view.levelmaskingrowfilter.extension.enable true
-
Save and restart the Databricks cluster.
Apply View-level Access Control#
To CREATE VIEW in Spark Plug-In, you need the permission for DATA_ADMIN.
The source table on which you are going to create a view requires DATA_ADMIN access in Ranger policy.
Use Case
-
Let’s take a use case where we have 'employee_db' database and two tables inside it with below data:
#Requires create privilege on the database enabled by default; create database if not exists employee_db;
-
Create two tables.
#Requires create privilege on the table level; create table if not exists employee_db.employee_data(id int,userid string,country string); create table if not exists employee_db.country_region(country string,region string);
-
Insert test data.
#Requires update privilege on the table level; insert into employee_db.country_region values ('US','NA'), ('CA','NA'), ('UK','UK'), ('DE','EU'), ('FR','EU'); insert into employee_db.employee_data values (1,'james','US'),(2,'john','US'), (3,'mark','UK'), (4,'sally-sales','UK'),(5,'sally','DE'), (6,'emily','DE');
select * from employee_db.country_region; #Requires select privilege on the column level; select * from employee_db.employee_data; #Requires select privilege on the column level;
-
Now try to create a View on top of above two tables created, we will get ERROR as below:
create view employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country; Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [emily] does not have [DATA_ADMIN] privilege on [employee_db/employee_data] (state=42000,code=40000)
-
Create a view policy for table on employee_db.employee_region as shown in the above image.
Now create a policy as shown above in the image and try to execute the same query the query, it will pass through.
Note
Granting Data_admin privileges on the resource implicitly grants Select privilege on the same resource.
Alter View#
#Requires Atler permission on the view;
ALTER VIEW employee_db.employee_region AS select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;
Rename View#
#Requires Atler permission on the view;
ALTER VIEW employee_db.employee_region RENAME to employee_db.employee_region_renamed;
Drop View#
#Requires Drop permission on the view;
DROP VIEW employee_db.employee_region_renamed;
Row Level Filter#
create view if not exists employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;
select * from employee_db.employee_region;
Column Masking#
select * from employee_db.employee_region;
Whitelisting for Py4J Security Manager#
Certain Python methods are blacklisted on Databricks clusters to enhance security on the clusters. While trying to access such methods, you might receive the following error:
Error
py4j.security.Py4JSecurityException: … is not whitelisted”.
If you still want to access the Python classes or methods, you can add them to a whitelisting file. To whitelist classes or methods, do the following:
-
Create a file containing a list of all the packages, class constructors or methods that should be whitelisted.
-
For whitelisting a complete java package (including all it’s classes), add the package name ending with .* in the end.
org.apache.spark.api.python.*
-
For whitelisting constructors of the given class, add the fully qualified class name.
org.apache.spark.api.python.PythonRDD
-
For whitelisting specific methods of a given class, add the fully qualified class name followed by the method name.
org.apache.spark.api.python.PythonRDD.runJobToPythonFile org.apache.spark.api.python.SerDeUtil.pythonToJava
-
-
Once you have added all the required packages, classes and methods, the file will contain a list of commands as shown below.
org.apache.spark.sql.SparkSession.createRDDFromTrustedPath org.apache.spark.api.java.JavaRDD.rdd org.apache.spark.rdd.RDD.isBarrier org.apache.spark.api.python.*
-
Upload the file to a DBFS location that could be referenced from the Spark Application Configuration section.
Suppose the
whitelist.txt
file contains the classes/methods to be whitelisted. Run following command to upload to Databricks.dbfs cp whitelist.txt dbfs:/privacera/whitelist.txt
-
Add the following command to the Spark Config with reference to the DBFS file location.
spark.hadoop.privacera.whitelist dbfs:/privacera/whitelist.txt
-
Restart your cluster.