Skip to main content

Privacera Platform master publication

Streamsets

:

These are steps to install and configure the Privacera plugin for Streamsets for Ranger and Privacera Encryption.

Prerequisites

Streamsets should already be up and running.

Privacera Encryption in Streamsets Data Collector (SDC)
Enable Encryption for SDC
  1. Run the following command:

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.crypto.streamset.yml custom-vars/vars.crypto.streamset.yml
    
  2. Run the update.

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update
    
Configure Encryption for SDC
  1. Copy the Streamsets Privacera package.

    1. If you have Streamsets and Privacera Manager running on different systems, copy the following two files from the location, ~/privacera/privacera-manager/output/streamset/ of the Privacera Manager host machine:

      • privacera-streamset.tar.gz

      • crypto-config

      If you have JCEKS enabled, copy the following file from the location, ~/privacera/privacera-manager/config/keystores/ of the Privacera Manager host machine:

      • cryptoprop.jceks

    2. If you have Streamsets and Privacera Manager running on same systems, do the following:

      cp ~/privacera/privacera-manager/output/streamset/privacera-streamset.tar.gz ~/privacera/downloads
      cp -r ~/privacera/privacera-manager/output/streamset/crypto-config ~/privacera/downloads/crypto-config
      

      If you have JCEKS enabled, do the following:

      cp ~/privacera/privacera-manager/config/keystores/cryptoprop.jceks ~/privacera/downloads/crypto-config/
      
  2. Extract the Streamsets Privacera package.

    cd ~/privacera/downloads
    mkdir streamsets
    tar xfz ~/privacera/downloads/privacera-streamset.tar.gz -C streamsets
    
  3. Access the Streamsets installation directory as root user.

    sudo su
    
  4. Set Streamsets installation directory.

    exportSTREAMSET_HOME=/opt/streamset/streamsets-datacollector-3.13.0
    
  5. Copy the Privacera library into the Streamsets data collector user-libs directory:

    cp -r streamsets/privacera-streamset/ $<STREAMSET_HOME>/user-libs/
    
  6. Copy the configuration files.

    cp -r crypto-config $<STREAMSET_HOME>/../crypto-config
    
  7. Define security policy.

    cat << EOF >> $<STREAMSET_HOME>/etc/sdc-security.policy 
    grant <
    permission java.io.FilePermission "/opt/privacera/-", "read";
    permission java.io.FilePermission "/opt/streamset/-", "read,write";
    permission java.net.SocketPermission "*", "connect,accept,listen,resolve";
    >;
    EOF                              
  8. Stop Streamsets.

    kill -9 $(ps aux | grep 'sdc'| awk '<print $2>')
  9. Restart Streamsets.

    ulimit -n 32768
    nohup $<STREAMSET_HOME>/bin/streamsets dc &
    
  10. Verify the logs to make sure that Streamsets is running.

    tail -f $<STREAMSET_HOME>/log/sdc.log
    
Verification
  1. Configure a sample pipeline to encrypt a local file. You can use the following sample. Import this sample pipeline into Streamsets. For more information, Sample pipeline

  2. Access the Streamsets installation directory as root user.

    sudo su
    
  3. Create data directories.

    DATA_DIR=/opt/streamset/
    cd $<DATA_DIR>
    mkdir -p customer_data/input 
    mkdir -p customer_data/output
    mkdir -p customer_data/input_error
    mkdir -p customer_data/output/encrypted_error
    
  4. Create a sample data file:

    cat << EOF > customer_data/input/customer_data_with_header.csv 
    id,name,ssn,email_address,amount
    1,Tamara,898453744,aphillips@vang.info,162454.67
    2,Richard,65511350,vreynolds@gmail.com,602.89
    3,Tanya,634090950,harringtonwilliam@diaz-king.com,48712.67
    4,Richard,829439881,martinvalerie@yahoo.com,5122.02
    5,Raymond,227804351,sarachavez@yahoo.com,97963.857
    6,Melissa,553465892,kevinwillis@gmail.com,36654.806
    7,Deborah,782539839,brittney24@yahoo.com,19.231
    8,Rodney,515337130,jenniferkelly@davis-bond.biz,65083.651
    9,Katherine,137057143,jperkins@gmail.com,4822.343
    10,David,432941241,wmccann@hotmail.com,4069.34
    EOF
  5. Create a metadata file to map the input dataset columns to Privacera Encryption schema columns:

    cat << EOF > customer_data/customer_data.meta
    COLUMN_NAME|SCHEME_NAME
    id|
    name|SYSTEM_PERSON_NAME
    ssn|SYSTEM_SSN
    email_address|SYSTEM_EMAIL
    amount|
    EOF

    To run the sample pipeline, make sure you have the Privacera user created in your Ranger and it has permissions on the KMS keys starting with pmsk*.

Ranger Configuration: Add Permission for Keys
  1. Login to the Ranger UI as an administrator and create the Privacera user. You can grant permissions to the Privacera user on keys.

    img src"assets/create_privacera_user.jpg"
  2. Login to Ranger with keyadmin credentials and click on privacera_kms.

    img src"assets/privacera_kms.jpg"
  3. Create or update policy for Privacera user.

    img src"assets/policy_privacera_user.jpg"
  4. Now run the Streamsets pipeline preview and verify the encrypted value on right side of the table as shown in the screenshot below.

    img src"assets/preview_streamsets.jpg"