Skip to content

EMR with Native Apache Ranger#

AWS EMR provides native Apache Ranger integration with the open source Apache Ranger plugins for Apache Spark and Hive. By connecting EMR’s native Ranger with Privacera’s Ranger-based data access governance, it gives the following key advantages:

  • Companies will have the ability to sync their existing policies with their EMR solution.
  • Extend Apache Ranger’s open source capabilities to take advantage of Privacera’s centralized enterprise-ready solution.

Note

Supported EMR version: 5.32 and above in EMR 5.x series.

Prerequisites

AWS Secrets are required for the following to store the Ranger Admin and Ranger plugin certificates.

  • ranger-admin-pub-cert

  • ranger-plugin-private-keypair

To create the two secrets in AWS Secret Manager, do the following:

  1. Login to AWS console and navigate to Secrets Manager and then click Store a new secret option.

  2. Select secret type as Other type of secrets and then go to the Plaintext tab. Keep the Default value unchanged. The actual value for this secret will be obtained after the installation is done.

  3. Select the encryption key as per your requirement.

  4. Click Next.

  5. Under Secret name, type a name for the secret in the text field. For example: ranger-admin-pub-cert, ranger-plugin-private-keypair.

  6. Click Next. The Configure automatic rotation page is displayed.

  7. Click Next.

  8. On the Review page, you can check your secret settings and then click Store to save your changes.

    The Secret is stored successfully.

Configuration

  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.emr.native.ranger.yml config/custom-vars/
    vi config/custom-vars/vars.emr.native.ranger.yml
    
  3. Edit the following properties.

    Property Description Example
    EMR_NATIVE_ENABLE Property to enable EMR native Ranger integration. EMR_NATIVE_ENABLE: "true"
    Properties for EMR Specifications
    EMR_NATIVE_CLUSTER_NAME Name of the EMR Cluster. EMR_NATIVE_CLUSTER_NAME: "Privacera-EMR-Native-Ranger"
    EMR_NATIVE_AWS_REGION AWS Region where the cluster will reside. EMR_NATIVE_AWS_REGION: "{{AWS_REGION}}"
    EMR_NATIVE_AWS_ACCT_ID AWS Account ID where the EMR Cluster and its resources will reside. EMR_NATIVE_AWS_ACCT_ID: "587946681758"
    EMR_NATIVE_SUBNET_ID Subnet ID where the EMR Cluster nodes will reside. EMR_NATIVE_SUBNET_ID: ""
    EMR_NATIVE_KEYPAIR An existing EC2 key pair to SSH into the node of cluster EMR_NATIVE_KEYPAIR: "privacera-test-pair"
    EMR_NATIVE_EC2_MARKET_TYPE Market Type for the EMR Cluster nodes. For example, SPOT or ON_DEMAND. EMR_NATIVE_EC2_MARKET_TYPE: "SPOT"
    EMR_NATIVE_EC2_INSTANCE_TYPE Instance Type for the EMR Cluster nodes. EMR_NATIVE_EC2_INSTANCE_TYPE: "m5.2xlarge"
    EMR_NATIVE_MASTER_NODE_COUNT Node count for Master. EMR_NATIVE_MASTER_NODE_COUNT: "1"
    EMR_NATIVE_CORE_NODE_COUNT Node count for Core. EMR_NATIVE_CORE_NODE_COUNT: "1"
    EMR_NATIVE_VERSION EMR Native Ranger integation is supported from 5.32 and above. EMR_NATIVE_VERSION: "emr-5.32.0"
    EMR_NATIVE_TERMINATION_PROTECT To enable termination protection. EMR_NATIVE_TERMINATION_PROTECT: "true"
    EMR_NATIVE_LOGS_PATH S3 location for EMR logs storage. EMR_NATIVE_LOGS_PATH: "s3://privacera-emr/logs"
    Properties to configure EMR Security Group
    EMR_NATIVE_CREATE_SG Set this to true, if you don't have existing security groups and want Privacera Manager to take care of adding security groups creation steps in EMR CloudFormation Template. EMR_NATIVE_CREATE_SG: "false"
    If EMR_NATIVE_CREATE_SG is false, fill the following properties with existing security group ids:
    EMR_NATIVE_MASTER_SG_ID Security Group ID for EMR Master Node Group. EMR_NATIVE_MASTER_SG_ID: "sg-xxxxxxx"
    EMR_NATIVE_SLAVE_SG_ID Security Group ID for EMR Slave Node Group. EMR_NATIVE_SLAVE_SG_ID: "sg-xxxxxxx"
    EMR_NATIVE_SERVICE_ACCESS_SG_ID Security Group ID for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a private network. EMR_NATIVE_SERVICE_ACCESS_SG_ID: "sg-xxxxxxx"
    If EMR_NATIVE_CREATE_SG is true, fill the following properties to give security group names for new groups which will be added in emr-template.json :
    EMR_NATIVE_SG_VPC_ID VPC ID in which you want to create the EMR Cluster. EMR_NATIVE_SG_VPC_ID: "vpc-xxxxxxxxxxx"
    EMR_NATIVE_MASTER_SG_NAME Security Group Name for EMR Master Node Group. EMR_NATIVE_MASTER_SG_NAME: "priv-master-sg"
    EMR_NATIVE_SLAVE_SG_NAME Security Group Name for EMR Slave Node Group. EMR_NATIVE_SLAVE_SG_NAME: "priv-slave-sg"
    EMR_NATIVE_SERVICE_ACCESS_SG_NAME Security Group Name for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a private network. EMR_NATIVE_SERVICE_ACCESS_SG_NAME: "priv-private-sg"
    EMR_NATIVE_SECURITY_CONFIG Name of the security configurations created for EMR. This can be an existing configuration or Privacera Manager can generate a template through which new configurations can be created. The new template will be available at ~/privacera/privacera-manager/output/emr/emr-native-sec-config-template.json after you run the Privacera Manager update command. EMR_NATIVE_SECURITY_CONFIG: ""
    Properties for EMR Hive Metastore
    EMR_NATIVE_HIVE_METASTORE Metastore type. For example, internal, hive (For external hive-metastore) EMR_NATIVE_HIVE_METASTORE: "hive"
    EMR_NATIVE_HIVE_METASTORE_WAREHOUSE_PATH S3 location for Hive metastore warehouse EMR_NATIVE_HIVE_METASTORE_WAREHOUSE_PATH: "s3://hive-warehouse"
    Fill the following properties, if EMR_NATIVE_HIVE_METASTORE is hive:
    EMR_NATIVE_METASTORE_CONNECTION_URL JDBC Connection URL for connecting to Hive Metastore. EMR_NATIVE_METASTORE_CONNECTION_URL: "jdbc:mysql://:3306/?createDatabaseIfNotExist=true"
    EMR_NATIVE_METASTORE_CONNECTION_DRIVER JDBC Driver Name EMR_NATIVE_METASTORE_CONNECTION_DRIVER: "org.mariadb.jdbc.Driver"
    EMR_NATIVE_METASTORE_CONNECTION_USERNAME JDBC UserName EMR_NATIVE_METASTORE_CONNECTION_USERNAME: "hive"
    EMR_NATIVE_METASTORE_CONNECTION_PASSWORD JDBC Password EMR_NATIVE_METASTORE_CONNECTION_PASSWORD: "StRong@PassWord"
    Properties of Kerberos Server
    EMR_NATIVE_KDC_ADMIN_PASSWORD The password used within the cluster for the kadmin service. EMR_NATIVE_KDC_ADMIN_PASSWORD: ""
    EMR_NATIVE_CROSS_REALM_PASSWORD The cross-realm trust principal password, which must be identical across realms. EMR_NATIVE_CROSS_REALM_PASSWORD: ""
    EMR_NATIVE_KERB_TICKET_LIFETIME The period for which a Kerberos ticket issued by the cluster’s KDC is valid. Cluster applications and services auto-renew tickets after they expire. EMR_NATIVE_KERB_TICKET_LIFETIME: 24
    EMR_NATIVE_KERB_REALM The Kerberos realm name for the other realm in the trust relationship. EMR_NATIVE_KERB_REALM: ""
    EMR_NATIVE_KERB_DOMAIN The domain name of the other realm in the trust relationship. EMR_NATIVE_KERB_DOMAIN: ""
    EMR_NATIVE_KERB_ADMIN_SERVER The fully qualified domain name (FQDN) and optional port for the Kerberos admin server in the other realm. If a port is not specified, 749 is used. EMR_NATIVE_KERB_ADMIN_SERVER: ""
    EMR_NATIVE_KERB_KDC_SERVER The fully qualified domain name (FQDN) and optional port for the KDC in the other realm. If a port is not specified, 88 is used. EMR_NATIVE_KERB_KDC_SERVER: ""
    Properties of Certificates Secrets
    EMR_NATIVE_RANGER_PLUGIN_SECRET_ARN Full ARN of AWS secret [stored in AWS Secrets Manager] for Ranger plugin key-pair. This is the secret created in the Prerequisites step above. EMR_NATIVE_RANGER_PLUGIN_SECRET_ARN: "arn:aws:secretsmanager:us-east-1:99999999999:secret:ranger-plugin-key-pair-ixZbO2"
    EMR_NATIVE_RANGER_ADMIN_SECRET_ARN Full ARN of AWS secret [stored in AWS Secrets Manager] for Ranger admin public certificate. This is the secret created in the Prerequisites step above. EMR_NATIVE_RANGER_ADMIN_SECRET_ARN: "arn:aws:secretsmanager:us-east-1:99999999999:secret:ranger-admin-public-cert-ixfCO5"
    Properties of EMR application
    EMR_NATIVE_APP_SPARK_ENABLE Installs Spark application with EMR native Ranger plugin, if set to true. EMR_NATIVE_APP_SPARK_ENABLE: "true"
    EMR_NATIVE_APP_HIVE_ENABLE Installs Hive application with EMR native Ranger plugin, if set to true. EMR_NATIVE_APP_HIVE_ENABLE: "true"
    EMR_NATIVE_APP_ZEPPELIN_ENABLE Installs Zeppelin application, if set to true. EMR_NATIVE_APP_ZEPPELIN_ENABLE: "true"
    EMR_NATIVE_APP_LIVY_ENABLE Installs Livy application, if set to true. EMR_NATIVE_APP_LIVY_ENABLE: "true"
    Properties of IAM Role Configuration
    EMR_NATIVE_DEFAULT_ROLE Default role attached to EMR cluster for performing cluster related activities. This should be an existing role. EMR_NATIVE_DEFAULT_ROLE: "EMR_DefaultRole"
    EMR_NATIVE_INSTANCE_ROLE The IAM Role which will be attached to each node in the EMR Cluster. This should have only minimal permissions for basic EMR functionalities. EMR_NATIVE_INSTANCE_ROLE: "restricted_instance_role"
    EMR_NATIVE_DATA_ACCESS_ROLE This role provides credentials for trusted execution engines, such as Apache Hive and AWS EMR Record Server AWS EMR Components, to access AWS S3 data. Use this role only to access AWS S3 data, including any KMS keys, if you are using S3 SSE-KMS. EMR_NATIVE_DATA_ACCESS_ROLE: "emr_native_data_access_role"
    EMR_NATIVE_USER_ACCESS_ROLE This role provides users who are not trusted execution engines with credentials to interact with AWS services, if needed. Do not use this IAM role to allow access to AWS S3 data, unless its data that should be accessible by all users. EMR_NATIVE_USER_ACCESS_ROLE: "emr_native_user_access_role"
    Properties to send EMR Ranger Engines Audits to Solr
    EMR_NATIVE_ENABLE_SOLR_AUDITS Enable audits to Solr. EMR_NATIVE_ENABLE_SOLR_AUDITS: "true"
    AUDITSERVER_AUTH_TYPE EMR Native Ranger Audits Frameworks does not support basic authentication, hence this needs to be disabled. This property needs to changed in vars.auditserver.yml, if already existing. AUDITSERVER_AUTH_TYPE: "none"
    AUDITSERVER_SSL_ENABLE Incase of self-signed SSL, EMR native Ranger does not support SSL for Solr audits. Hence, AuditServer SSL should be disabled. AUDITSERVER_SSL_ENABLE: "false"
    EMR_NATIVE_CLOUDWATCH_GROUPNAME Add a CloudWatch LogGroup to push Ranger Audits. This should be an existing Group. EMR_NATIVE_CLOUDWATCH_GROUPNAME: "emr_privacera_native_logs"

    Note

    You can also add custom properties that are not included by default. See EMR.

  4. Run the following commands.

    cd ~/privacera/privacera-manager 
    ./privacera-manager.sh update
    
  5. Once update is done, all the CloudFormation JSON template files will be available at ~/privacera/privacera-manager/output/emr-native-ranger path.

  6. Run the following command in the AWS instance where Privacera is installed.

    cd ~/privacera/privacera-manager/output/emr-native-ranger
    
  7. Create the certificates which needs to be added in AWS Secrets Manager.

    You will get multiple prompts to enter the keystore password. Use the property value of RANGER_PLUGIN_SSL_KEYSTORE_PASSWORD set in ~/privacera/privacera-manager/config/custom-vars/vars.ssl.yml for each prompt.

    1. Run the following command.

      ./emr-native-create-certs.sh
      

      This will create the following two files. You need to update the secrets in both the files, which was created in the Prerequisites section above:

      • ranger-admin-pub-cert.pem
      • ranger-plugin-keypair.pem
    2. Display the contents of the ranger-admin-pub-cert.pem file.

      cat ranger-admin-pub-cert.pem
      

      Select the file contents and then right-click in the terminal to copy the contents.

    3. Login to AWS console and navigate to Secrets Manager and then click ranger-admin-pub-cert.

    4. Navigate to Secret value section and then go to Retrieve Secret Value > Edit > Plaintext.

    5. Replace the secrets with the new value, which you copied in step 2.

    6. Similarly, follow the steps b-e above to display the file contents of ranger-plugin-keypair.pem and use the contents to replace the value of the ranger-plugin-private-keypair secrets in the AWS Secrets Manager.

  8. (Optional) Create IAM roles using the emr-native-role-creation-template.json template.

    aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-role-creation --template-body file://emr-native-role-creation-template.json --capabilities CAPABILITY_NAMED_IAM
    

    Note

    For giving access to data for Apache Hive and Apache Spark services, navigate to IAM Management in your AWS Console and add required S3 policies in the EMR_NATIVE_DATA_ACCESS_ROLE.

  9. (Optional) Create Security Configurations using the emr-native-sec-config-template.json template.

    aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-security-config-creation  --template-body file://emr-native-sec-config-template.json
    
  10. Create EMR using the emr-native-template.json template.

    aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-creation  --template-body file://emr-native-template.json
    

Last update: August 24, 2021