Skip to content

EMR User Guide#

  1. Create bucket ${SECURE_BUCKET_NAME} which you need to protect.

  2. Download sample data from below link and put into your bucket at location (s3://${SECURE_BUCKET_NAME}/sample_data/customer_data)

    wget https://privacera-demo.s3.amazonaws.com/data/uploads/customer_data_clear/customer_data_without_header.csv

  3. Make sure cluster should not have direct access on ${SECURE_BUCKET_NAME} bucket.

  4. To verify:

        ssh -i ${KEY_FILE} hadoop@${EMR_PUBLIC_DNS}
        aws s3 ls  s3://${SECURE_BUCKET_NAME}
    

    Result: Fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden

HIVE#

  1. Sample data setup.

    Run using admin user who has permission on url in Ranger and also has permission to create table and database.

    beeline -u "jdbc:hive2://hostname -f:10000/default;principal=hive/hostname -f@${REALM}"

  2. Create the table using admin/superuser.

    create database if not exists customer;
    use customer;
    CREATE EXTERNAL TABLE if not exists `customer_data_s3`(
    `id` string,
    `global_id` string,
    `name` string,
    `ssn` string,
    `email_address` string,
    `address` string)
    
    ROW FORMAT DELIMITED
    
        FIELDS TERMINATED BY ','
    
    STORED AS INPUTFORMAT
    
        'org.apache.hadoop.mapred.TextInputFormat'
    
    OUTPUTFORMAT
    
        'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    
    LOCATION
    
         's3a://${SECURE_BUCKET_NAME}/sample_data/customer_data';
    
  3. Exit from beeline.

  4. Switch to ${TEST_USER} and kinit and try sample policy.

    beeline  -u "jdbc:hive2://`hostname -f`:10000/default;principal=hive/`hostname -f`@${REALM}"
    #Check ranger audit for hive service
    Select * from customer.customer_data_s3 limit 10;
    

Data_Admin Access#

To CREATE VIEW in Hive Plug-In, you need the permission for DATA_ADMIN in Ranger which has been introduced in the latest (February) release of Privacera platform.

The source table on which you are going to create a view requires DATA_ADMIN access in Ranger policy.

Use Case

Let’s take a use case where we have 'employee_db' database and two tables inside it with below data:

#Requires create privilege on the database enabled by default;
create database if not exists employee_db;

  1. Create two tables.

    #Requires create privilege on the table level;
    
    create table if not exists employee_db.employee_data(id int,userid string,country string);
    create table if not exists employee_db.country_region(country string,region string);
    

  2. Insert test data.

    #Requires update privilege on the table level;
    
    insert into employee_db.country_region values ('US','NA'), ('CA','NA'), ('UK','UK'), ('DE','EU'), ('FR','EU'); 
    
    insert into employee_db.employee_data values (1,'james','US'),(2,'john','US'), (3,'mark','UK'), (4,'sally-sales','UK'),(5,'sally','DE'), (6,'emily','DE');
    

    select * from employee_db.country_region; 
    #Requires select privilege on the column level;
    
    select * from employee_db.employee_data; 
    #Requires select privilege on the column level;
    
  3. Try to create a View on top of above two tables created, we will get ERROR as below:

    create view employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;
    
    Error: Error while compiling statement: 
    FAILED: HiveAccessControlException 
    Permission denied: user [emily] does not have [DATA_ADMIN] privilege on [employee_db/employee_data] (state=42000,code=40000)
    

  4. Create a view policy for table on employee_db.employee_region as shown in the above image.

    Now create a policy as shown above in the image and try to execute the same query the query, it will pass through.

    Note

    Granting Data_admin privileges on the resource implicitly grants Select privilege on the same resource as well.

Alter View#

create view if not exists employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;
#Requires Create permission on the view;
ALTER VIEW employee_db.employee_region AS  select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;

Rename View#

#Requires Atler permission on the view;
ALTER VIEW  employee_db.employee_region RENAME to employee_db.employee_region_renamed;

Drop View#

#Requires Drop permission on the view;
DROP VIEW employee_db.employee_region_renamed;

Row Level Filter#

select * from employee_db.employee_region;

Column Masking#

select * from employee_db.employee_region;

PrestoDB#

  1. SSH to EMR on master node.

  2. Start Presto shell (presto, spark-thrift, hive all three using same metastore)

    presto-cli --catalog hive
    
    OR
    
    /usr/lib/presto/bin/presto-cli-0.210-executable --server localhost:8889 --catalog hive --schema default
    
  3. Sample use case to try out.

    CREATE SCHEMA customer WITH (location = 's3a://${SECURE_BUCKETNAME}/presto_data/customer/');
    use customer;
    CREATE TABLE  cust_data (
    
    EMP\_SSN varchar,
    
    CC varchar,
    
    FIRST\_NAME varchar ,
    
    LAST\_NAME varchar ,
    
    ADDRESS varchar ,
    
    ZIPCODE varchar ,
    
    EMAIL varchar ,
    
    US\_PHONE\_FORMATTED varchar
    
        );
        insert into  cust_data values ('12345', '6789', 'Will','Smith', 'US', '400098','ws@gmail.com', '010-564-333');
        select * from  cust_data;
    
  4. Full Table Access.

    #Add policy in ranger to access everything in the table
    SELECT * FROM cust_data;
    
  5. Restricted Column Access.

    #Column level permission in table. If User doesn't have permission to “first_name” column
    #Will be denied in audit
    select first_name from cust_data;
    #Will be allowed in audit
    select last_name, address, zipcode, email from cust_data;
    

Presto SQL - Use Case#

  1. Start PrestoSQL shell.

    presto-cli --catalog hive
    
  2. Create the schema using admin/superuser.

    CREATE SCHEMA customer WITH (location = 's3a://${SECURE_BUCKETNAME}/presto_data/schema/customer’);
    use customer;
    
  3. Create the table using admin/superuser

    use customer;
    
    CREATE TABLE customer_data(
    id varchar,
    name varchar,
    ssn varchar,
    email_address varchar,
    address varchar)
    WITH (
        format = 'textfile',
        external_location = 's3a://${SECURE_BUCKETNAME}/presto_data/table/customer_data'
    );
    
  4. Exit from Presto-CLI and switch to {TEST_USER} and kinit and try sample policy.

    presto-cli --catalog hive
    use customer;
    select * from customer_data limit 10;
    

Data_Admin Access#

To CREATE VIEW in Presto SQL, you need the permission for DATA_ADMIN in Presto Ranger policy.

The source table on which you are going to create a view requires DATA_ADMIN access in Ranger policy.

Use Case

Let’s take a use case where you have 'employee_db' database and two tables inside it with below data:

#Requires create privilege on the database enabled by default;
create schema if not exists employee_db;

  1. Create two tables.

    #Requires create privilege on the table level;
    
    CREATE TABLE IF NOT EXISTS employee_db.employee_data(id int, userid string, country string);
    
    CREATE TABLE IF NOT EXISTS employee_db.country_region(country string, region string);
    

  2. Insert test data.

    #Requires update privilege on the table level;
    
    insert into employee_db.country_region values ('US','NA'), ('CA','NA'), ('UK','UK'), ('DE','EU'), ('FR','EU');
    
    insert into employee_db.employee_data values (1,'james','US'),(2,'john','US'), (3,'mark','UK'), (4,'sally-sales','UK'),(5,'sally','DE'), (6,'emily','DE');
    

  3. Try to create a View on top of above two tables created, you will get ERROR as below:

    Query 20210223_051227_00005_nyxtw failed: Access Denied: Cannot create view tbl_view_5
    

    You need to ‘Create View’ permission as below:

  4. After grant ‘Create View’ permission, the query will throw below error message:

     Query 20210223_050930_00004_nyxtw failed: Access Denied: User [emily] does not have [DATA_ADMIN] privilege on [hive/employee_db/employee_data]
    

    You need to grant ‘Data_Admin’ permission for both the tables as mentioned in below image and execute the create view query again, it will pass through.

    Note

    Granting Data_admin privileges on the resource implicitly grants Select privilege on the same resource as well.

Alter View#

Create View

presto:customer> create view tbl_view_1 as SELECT * FROM tbl_1;
CREATE VIEW
presto:customer> select * from tbl_view_1;
c0 |   c1   |    c2     |          c3           |           c4
----+--------+-----------+-----------------------+------------------------
2  | James  | 892821225 | james@walt.com        | 4578 Extension xxx
1  | Dennis | 619821225 | thomasashley@walt.com | 9478 Anthony Extension
3  | Sally  | 092341225 | sally@walt.com        | 5678 Extension xyxx
(3 rows)


Query 20210303_142252_00006_g76nu, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
1.86 [3 rows, 169B] [1 rows/s, 91B/s]

Alter View

presto:customer> CREATE OR REPLACE VIEW tbl_view_1 as select * from tbl_3;
CREATE VIEW
presto:customer> select * from tbl_view_1;
slno | name | mobile |  email  | address
------+------+--------+---------+---------
1    | emily |   1234 | s@s.com | in
(1 row)

Query 20210303_142341_00009_g76nu, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0.91 [1 rows, 0B] [1 rows/s, 0B/s]

Rename View#

presto:customer> alter view tbl_view_1 rename to tbl_view_2;
RENAME VIEW
presto:customer>

Drop View#

presto:customer> drop view tbl_view_1;
DROP VIEW
presto:customer>

Row Level Filter#

presto:employee_db> select * from tbl_1;

id |   userid    | country
----+-------------+---------
1 | james       | US
2 | john        | US
3 | mark        | UK
4 | sally-sales | UK
5 | sally       | DE
6 | emily       | DE
(6 rows)

Query 20210309_060602_00022_5amn7, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
4.11 [6 rows, 0B] [1 rows/s, 0B/s]

presto:employee_db>
presto:employee_db> select * from tbl_1;
id | userid | country
----+--------+---------
1 | james  | US
2 | john   | US
(2 rows)

Query 20210309_061202_00024_5amn7, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0.45 [6 rows, 0B] [13 rows/s, 0B/s]

Column Masking#

presto:employee_db> select * from tbl_1;
id |   userid    | country
----+-------------+---------
1 | james       | US
2 | john        | US
3 | mark        | UK
4 | sally-sales | UK
5 | sally       | DE
6 | emily       | DE
(6 rows)

Query 20210309_062000_00027_5amn7, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0.30 [6 rows, 0B] [20 rows/s, 0B/s]

presto:employee_db>
presto:employee_db> select * from tbl_1;
id |   userid    | country
----+-------------+---------
1 | james       | NULL
2 | john        | NULL
3 | mark        | NULL
4 | sally-sales | NULL
5 | sally       | NULL
6 | emily       | NULL
(6 rows)

Query 20210309_061856_00026_5amn7, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0.32 [6 rows, 0B] [18 rows/s, 0B/s]

HUE#

  1. SSH to master node.

  2. Edit /etc/hue/conf/hue.ini and modify the JDBC Presto parameters:

    In hue.ini:

    options = '{"url": "jdbc:presto://ip-10-230-0-234.ca-central-1.compute.internal:8889/hive/default", "driver": "com.facebook.presto.jdbc.PrestoDriver", "user":"","password":""}'
    

    Change the user to “” so that it takes the credentials of logged in user of Hue.

  3. Restart Hue.

    sudo stop hue
    sudo start hue
    
  4. Login to Hue console through /<master-node>:8888

  5. Set the Admin username and password.

  6. Add more Hue users through the Admin console.

  7. Logout and login using the newly created user in Hue console.

  8. Access the tables through Hive/Presto.

  9. Check in Privacera Ranger, if the username is the same as the user logged in to Hue.

LIVY#

  1. Setup Livy and Zeppelin.

    SSH with port forwarding or open 8890 port to access Zeppelin from the web browser.

    ssh -i ${KEY_FILE}  -L 8890:localhost:8890
    hadoop@${EMR_PUBLIC_DNS}
    
  2. Go to Zeppelin web UI (http://localhost:8890).

  3. Enable the user based login (https://zeppelin.apache.org/docs/0.6.2/security/shiroauthentication.html).

    sudo su
    cp /etc/zeppelin/conf/zeppelin-site.xml.template /etc/zeppelin/conf/zeppelin-site.xml
    chown zeppelin:zeppelin /etc/zeppelin/conf/zeppelin-site.xml
    
    vi /etc/zeppelin/conf/zeppelin-site.xml
    
    #Change the property, if exists
    #This property removed from zeppelin 0.9.0 (https://issues.apache.org/jira/browse/ZEPPELIN-4489)
    zeppelin.anonymous.allowed=false
    
    cp /etc/zeppelin/conf/shiro.ini.template /etc/zeppelin/conf/shiro.ini
    
    vi /etc/zeppelin/conf/shiro.ini
    
    #Add required users in [users] as below  --
    [users]
    hadoop = hadoop123, admin
    
    chown zeppelin:zeppelin /etc/zeppelin/conf/shiro.ini
    
  4. Check Livy port using below command.

    vi /etc/livy/conf/livy.conf
    
    livy.server.port=8998
    
  5. Stop and restart the Zeppelin.

    sudo stop zeppelin
    
    sudo start zeppelin
    
  6. Go to /<master-node>:8890. Login with required username/password which you have created in step 3.

  7. Go to Settings > Interpreter > Livy > Edit and perform the following steps:

    • Keep only Scope with per user.

    • Set the properties below.

      • livy.spark.driver.cores=1

      • livy.spark.driver.memory=1g

      • livy.spark.executor.cores=1

      • livy.spark.executor.instances=2

      • livy.spark.executor.memory=1g

      • livy.spark.driver.extraClassPath=/opt/privacera/plugin/privacera-spark-plugin/spark-plugin/*:{copy spark.driver.extraClassPath from /etc/spark/conf/spark-defaults.conf}

  8. Save and restart.

  9. Run the sample Livy Spark code.

    • Go to Zeppelin web UI (http://localhost:8890).

    • Create a new notebook using the below command.

      %livy.spark
      
      val df =spark.read.csv("s3://${SECURE_BUCKET_NAME}/sample_data/customer_data/customer_data_without_header.csv");
      df.show()
      
    • Check audit for the above executed command in Privacera Access Manager using the below steps:

      • On the Privacera Portal home page, expand Access Management and click the Audit from the left menu.

      • The Audit page will be displayed with Ranger Audit details.

Spark Object-Level Access Control (OLAC)#

Submit Spark Applications#

You can submit an application consisting of compiled and packaged Java or Spark JAR. You can deploy the JAR locally (client) or cluster. Use the tabs according to your deployment mode.

  1. SSH to the master node.

  2. Run the following command:

spark-submit \
--master yarn \
--driver-memory 512m \
--executor-memory 512m \
--class <clas-to-run> <your-jar> <arg1> <arg2>
  1. SSH to the master node.

  2. Run the following command:

spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 512m \
--executor-memory 512m \
--driver-class-path "/opt/privacera/plugin/privacera-spark-plugin/spark-plugin/*:<copy spark.driver.extraClassPath from /etc/spark/conf/spark-defaults.conf>" \
--class <clas-to-run> <your-jar> <arg1> <arg2>

Spark Fine-grained Access Control (FGAC)#

View-level Access#

To enable the view-level access control, do the following:

  1. SSH to the master node of EMR cluster.

  2. Edit the spark-defaults.conf file.

    sudo vim /etc/spark/conf/spark-defaults.conf
    
  3. Add the following property.

    spark.hadoop.privacera.spark.view.levelmaskingrowfilter.extension.enable true
    

To learn how to use view-level access control in Spark, click here.


Last update: August 24, 2021