Privacera Platform master publication

Spark standalone
:
Privacera plugin in Spark standalone

This section covers how you can use Privacera Manager to generate the setup script and Spark custom configuration for SSL/TSL to install Privacera Plugin in an open-source Spark environment.

The steps outlined below are only applicable to the Spark 3.x version.

Prerequisites

Ensure the following prerequisites are met:

  • A working Spark environment.

  • Privacera services must be up and running.

Configuration
  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.spark-standalone.yml config/custom-vars/
    vi config/custom-vars/vars.spark-standalone.yml
  3. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    SPARK_STANDALONE_ENABLE:"true"
    SPARK_ENV_TYPE:"<PLEASE_CHANGE>"
    SPARK_HOME:"<PLEASE_CHANGE>"
    SPARK_USER_HOME:"<PLEASE_CHANGE>"
    
  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    

    After the update is complete, the setup script (privacera_setup.sh, standalone_spark_FGAC.sh, standalone_spark_OLAC.sh) and Spark custom configurations (spark_custom_conf.zip) for SSL will be generated at the path, cd ~/privacera/privacera-manager/output/spark-standalone.

  5. You can either enable FGAC or OLAC in your Spark environment.

    Enable FGAC

    To enable Fine-grained access control (FGAC), do the following:

    1. Copy standalone_spark_FGAC.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.

    2. Add permissions to execute the script.

      chmod +x standalone_spark_FGAC.sh
      
    3. Run the script to install the Privacera plugin in your Spark environment.

      ./standalone_spark_FGAC.sh

    Enable OLAC

    To enable Object level access control (OLAC), do the following:

    1. Copy standalone_spark_OLAC.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.

    2. Add permissions to execute the script.

      chmod +x standalone_spark_OLAC.sh
      
    3. Run the script to install the Privacera plugin in your Spark environment.

      ./standalone_spark_OLAC.sh
      
Configuration properties

Property

Description

Example

SPARK_STANDALONE_ENABLE

Property to enable generating setup script and configs for Spark standalone plugin installation.

true

SPARK_ENV_TYPE

Set the environment type. It can be any user-defined type.

For example, if you're working in an environment that runs locally, you can set the type as local; for a production environment, set it as prod.

local

SPARK_HOME

Home path of your Spark installation.

~/privacera/spark/spark-3.1.1-bin-hadoop3.2

SPARK_USER_HOME

User home directory of your Spark installation.

/home/ec2-user

SPARK_STANDALONE_RANGER_IS_FALLBACK_SUPPORTED

Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user.

To enable the fallback, set to true; to disable, set to false.

true

Validations

To verify the successful installation of Privacera plugin, do the following:

  1. Create an S3 bucket ${S3_BUCKET} for sample testing.

  2. Download sample data using the following link and put it in the ${S3_BUCKET} at location (s3://${S3_BUCKET}/customer_data).

    wget https://privacera-demo.s3.amazonaws.com/data/uploads/customer_data_clear/customer_data_without_header.csv
    
  3. (Optional) Add AWS JARS in Spark. Download the JARS according to the version of Spark Hadoop in your environment.

    cd  <SPARK_HOME>/jars
    

    For Spark-3.1.1 - Hadoop 3.2 version,

    wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar
    wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar
    
  4. Run the following command.

    cd <SPARK_HOME>/bin
    
  5. Run the spark-shell to execute scala commands.

    ./spark-shell
    
Validations with JWT Token
  1. Run the following command.

    cd <SPARK_HOME>/bin
    
  2. Set the JWT_TOKEN.

    JWT_TOKEN="<JWT_TOKEN>"
  3. Run the following command to start spark-shell with parameters.

    ./spark-shell --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"  --conf "spark.hadoop.privacera.jwt.oauth.enable=true"
Validations with JWT token and public key
  1. Create a local file with the public key, if the JWT token is generated by private/public key combination.

  2. Set the following according to the payload of JWT Token.

    JWT_TOKEN="<JWT_TOKEN>"
    #The following variables are optional, set it only if token has it else set it empty
    JWT_TOKEN_ISSUER="<JWT_TOKEN_ISSUER>"
    JWT_TOKEN_PUBLIC_KEY_FILE="<JWT_TOKEN_PUBLIC_KEY_FILE_PATH>"
    JWT_TOKEN_USER_KEY="<JWT_TOKEN_USER_KEY>"
    JWT_TOKEN_GROUP_KEY="<JWT_TOKEN_GROUP_KEY>"
    JWT_TOKEN_PARSER_TYPE="<JWT_TOKEN_PARSER_TYPE>"
  3. Run the following command to start spark-shell with parameters.

    ./spark-shell 
    --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" 
    --conf "spark.hadoop.privacera.jwt.oauth.enable=true" 
    --conf "spark.hadoop.privacera.jwt.token.publickey=${JWT_TOKEN_PUBLIC_KEY_FILE}" 
    --conf "spark.hadoop.privacera.jwt.token.issuer=${JWT_TOKEN_ISSUER}"
    --conf "spark.hadoop.privacera.jwt.token.parser.type=${JWT_TOKEN_PARSER_TYPE}" 
    --conf "spark.hadoop.privacera.jwt.token.userKey=${JWT_TOKEN_USER_KEY}" 
    --conf "spark.hadoop.privacera.jwt.token.groupKey=${JWT_TOKEN_GROUP_KEY}"
Use cases
  1. Add a policy in Access Manager with read permission to ${S3_BUCKET}.

    val file_path = "s3a://${S3_BUCKET}/customer_data/customer_data_without_header.csv"
    val df=spark.read.csv(file_path)
    df.show(5)
    
  2. Add a policy in Access Manager with delete and write permission to ${S3_BUCKET}.

    df.write.format("csv").mode("overwrite").save("s3a://${S3_BUCKET}/csv/customer_data.csv")