PrivaceraCloud Documentation

EMR Native Ranger Integration with PrivaceraCloud
:

AWS EMR provides native Apache Ranger integration with the open source Apache Ranger plug-ins for Apache Spark and Hive. By connecting EMR’s plug-in with PrivaceraCloud’s Ranger-based data access governance has following advantages:

  • Enterprises can synch their existing policies with EMR.

  • Organizations can extend Apache Ranger’s open source capabilities to take advantage of Privacera’s centralized enterprise-ready solution.

Prerequisite

Connect Elastic MapReduce from Amazon and EMRFS S3 applications in your PrivaceraCloud portal.

Configuration
Certificate setup in Secrets Manager

AWS EMR Native Ranger mandates usage of mutual TLS between Ranger plug-ins and the Privacera Ranger Admin. To provide these TLS certificates, they must be in the AWS Secrets Manager and provided in an EMR Security Configuration. Perform the following steps to proceed with configuration:

Create two secrets in AWS Secret Manager:

  1. Ranger Admin Public Cert

    1. Login to AWS Console and navigate to Secrets Manager and then click Store a new secret option.

    2. Select secret type as Other type of secrets and then go to the Plaintext tab.

    3. Go to your PrivaceraCloud account and follow navigation Settings > ApiKey > AWS EMR Native Ranger Plugin > Ranger Admin Public Cert > Download Certificate.

    4. Add the contents of this Certificate in the Plaintext tab.

    5. Select the encryption key as per your requirement.

    6. Click Next. Enter the Secret name. For example: ranger-admin-pub-cert

    7. Click Next. The Configure automatic rotation page is displayed. No action required.

      Click Next.

    8. Review Secret details and click Store.

      The Secret is stored successfully.

  2. Ranger Client KeyPair

    1. Login to AWS Console and navigate to Secrets Manager and then click Store a new secret option.

    2. Select secret type as Other type of secrets and then go to Plaintext tab.

    3. Go to your PrivaceraCloud account and follow navigation Settings > ApiKey > AWS EMR Native Ranger Plugin > Ranger Client KeyPair > Download Certificate.

    4. Add the contents of this certificate in the Plaintext tab.

    5. Select the encryption key as per your requirement.

    6. Click Next. Enter the Secret name. For example: ranger-plugin-key-cert

    7. Click Next. The Configure automatic rotation page is displayed. No action required.

      Click Next.

    8. Review Secret details and click Store.

      The Secret is stored successfully.

IAM roles setup
Manually setup IAM roles

Create the following three IAM Roles:

Create security configurations
Manually Setup Security Configurations
  1. Login to AWS Console and navigate to EMR Console > Security Configuration (from left panel) > Create New Security Configuration.

  2. Enter the Security Configuration name. E.g. EMR_NATIVE_WITH_PLCOUD

  3. Navigate to Authentication section and select Enable Kerberos authentication checkbox and enter the Kerberos environment details.

  4. Under the Authorization section, select Enable integration with Apache Ranger for fine-grained access control and enter the details as below.

  5. IAM role for Apache Ranger: “EMR_RS_DATA_ACCESS_ROLE” (Created during IAM Roles setup).

  6. IAM role for other AWS Services: “EMR_RS_USER_ACCESS_ROLE” (Created during IAM Roles setup.

  7. Ranger Policy Manager: Go to your PCloud Account > Settings > ApiKey > AWS EMR Native Ranger > Ranger Admin mTLS URL > click Copy URL and add it in this section.

  8. Admin PEM secret: Choose ranger-admin-pub-cert using drop-down.

  9. EMRFS client PEM secret: Choose ranger-plugin-key-cert using drop-down.

  10. EMRFS policy repository: privacera_emrfs_s3

  11. Spark configurations: Select this option, if want to enable Spark Application.

  12. Spark client PEM secret: Choose ranger-plugin-key-cert using drop-down.

  13. Spark policy repository: privacera_hive 10. Hive configurations: Select this option, if want to enable Hive Application.

  14. Hive client PEM secret: Choose ranger-plugin-key-cert using drop-down.

  15. Hive policy repository: privacera_hive

  16. CloudWatch Log Group: Select a CloudWatch log group for pushing audits if required. Note: The “EMR_RS_DATA_ACCESS_ROLE” should have permissions to create and PutLogEvents in this log group(this has been configured during IAM roles setup).

Create EMR cluster
Manually setup EMR cluster
  1. Login to AWS Console and navigate to EMR service and click Create Cluster.

  2. Click Go to advanced options link.

  3. Under the Software Configuration:

  4. Select Release Version.

  5. Select additional applications as per your environment.

    If you select Hive or Spark applications, then it is mandatory to select HCatalog option.

  6. Under the Edit software settings, select the Enter configuration, and add the following text if you want to use external Hive Metastore.

    Glue Metastore is not supported.

    [
        {
                "Classification": "hive-site",
                "Properties": {
                "javax.jdo.option.ConnectionUserName": "${user-name}",
                "javax.jdo.option.ConnectionDriverName": "${jdbc-driver}",
                "javax.jdo.option.ConnectionURL": "${jdbc-url}",
                "javax.jdo.option.ConnectionPassword": "${jdbc-password}"
                }
       }
    ]
     
  7. Click Next.

  8. Under the Hardware settings, select values Networking, Node, and Instance values as appropriate for your environment.

  9. Under the General cluster settings.

    If you want to enable Audit logging for your applications in Privacera Portal, perform the following. It will add two scripts that will Install Ranger Audits Configurations on master and worker nodes.

  10. Enter the Cluster name.

  11. Select Logging, Debugging, and Termination protection checkboxes as per your environment.

  12. Configure Ranger Audits logging for Master Node:

  13. Under Additional Options, expand Bootstrap Actions, select bootstrap action Run if and click Configure and add.

    The Add Bootstrap Action dialog appears.

  14. In this dialog, enter the name to Configure Ranger Audits for Master.

  15. Add the following script in the Optional arguments field using your own {ranger-audit-setup-script-url} script URL.

    {ranger-audit-setup-script-url}: PCloud Portal > Access Manager > Settings > ApiKey > Click Info Icon > Ranger Audit Setup Script > Copy URL.

                instance.isMaster=true "wget <ranger-audit-setup-script-url>; chmod +x ./privacera_emr_native.sh ; sudo ./privacera_emr_native.sh"
                
  16. Click Add.

  17. Configure Ranger Audits for Worker nodes.

  18. Under Additional Options, expand Bootstrap Actions, select bootstrap action Run if and click Configure and add.

    The Add Bootstrap Action dialog appears.

  19. In this dialog, enter the name to Configure Ranger Audits for Master.

  20. Add the following script in the Optional arguments field using your own {ranger-audit-setup-script-url} script URL.

    {ranger-audit-setup-script-url}: PCloud Portal > Access Manager > Settings > ApiKey > Click Info Icon > Ranger Audit Setup Script > Copy URL.

    instance.isMaster=false "wget <ranger-audit-setup-script-url>; chmod +x ./privacera_emr_native.sh ; sudo ./privacera_emr_native.sh"
    
  21. Click Add.

  22. Under Security Options:

  23. Enter/select Security Options as per your environment.

  24. Under the Permissions section:

  25. EMR role: The EMR_EC2_Default role need to be selected.

    EC2 instance profile: “EMR_RS_INSTANCE_ROLE” created during IAM Roles setup.

  26. Expand Security Configuration, and select the configuration which you created earlier. E.g. "EMR_NATIVE_WITH_PLCOUD".

    Set Realm and enter a KDC admin password.

  27. Click the Create cluster.

Application usage

On the PrivaceraCloud Account, expand Settings and click Applications. For more information, see Elastic MapReduce from Amazon and EMRFS S3.

Spark

Spark SQL use case:

  1. SSH to EMR master node.

  2. kinit with your user.

  3. Run Spark-SQL shell using “spark-sql”.

  4. Run SQL type queries with Spark.

Policies are evaluated against the “privacera_hive” repository and audits can be seen under Access Manager > Audits.

Spark Shell use case:

  1. SSH to EMR master node.

  2. kinit with your user.

  3. Run Spark-shell using “spark-shell”.

  4. Run Scala queries with Spark.

Policies are evaluated against the privacera_emrfs_s3 policy repository for any S3 access. Audits can be seen under Access Manager > Audits.

Hive
  1. SSH to EMR master node.

  2. kinit with your user.

  3. Login to beeline shell using command below:

    beeline -u "jdbc:hive2://`hostname -f`:10000/default;principal=hive/`hostname -f`@EC2.INTERNAL"
                      
  4. Run Hive queries.

Policies are evaluated against the privacera_hive policy repository. Audits can be seen under Access Manager > Audits.

AWS documentation references

Last update: February 18, 2022