Skip to content

TagSync Using Apache Ranger#

Privacera Discovery allows you to classify information in files as tags when you scan files in a application. The tags can be used in access policies to configure access control for the application.

Apache Ranger requires the tagged information while applying a policy. This topic describes how you can propagate the tag details from Discovery to Apache Ranger.

Properties to enable TagSync#

You need to enable TagSync in the Privacera Portal by configuring following properties in the Application Properties UI. See General Process.

ranger.writer.enable=true
send.inherited.table.tags.to.ranger=true

Properties to add based on service type#

Apart from above properties, you need to add the additional properties based on service type in Application Properties UI. These properties will help to verify TagSync in Apache Ranger using Ranger utility script.

For example:

service_name=privacera_s3
cluster_name=privacera

Service name depends on the application for which you want to apply TagSync. The following is the list of services and value of property service_name to be set to validate TagSync for all the applications.

service_name=privacera_s3
cluster_name=privacera
service_name=privacera_redshift
cluster_name=privacera
service_name=privacera_postgres
cluster_name=privacera
service_name=privacera_snowflake
cluster_name=privacera
service_name=privacera_dynamodb
cluster_name=privacera
service_name=privacera_mssql
cluster_name=privacera
service_name=privacera_hive
Cluster_name=privacera

TagSync Validation Scenarios#

TagSync can be validated in following scenarios:

  1. Auto Scanning
  2. Meta Tagging
  3. Post-processing Tags
  4. Re-evaluate
  5. Add/Edit Tag
  6. Add Resource
  7. Tag Status Change
  8. Removal of Tag
  9. Removal of Resource
  10. Rescan of Same File

Note: Allowed and Rejected tags will not get sync to Apache Ranger.

Auto Scanning#

On the Classification page, files will get classified with system classified tags. After classification, all system-classified and manually accepted tags get synced to Apache Ranger.

Parent-Child Level TagSync in Ranger:

Based on DB applications or file system, following is the criteria to sync tags for parent and child:

DB Applications

If the resource is a Database:

  • On UI DB gets classified as:
    • Database, tag1, tag2, etc then in Ranger, child level entries are created as below:
      • (Database): tag1, tag2, etc.

If the resource is a table:

  • On UI classification is as below:
    • (Database, table), tag1, tag2, etc. then in Ranger child level entry can be seen as below:
      • (Database, table): tag1, tag2, etc.

If the resource is column:

  • On UI classification is as below:
    • (Database, table, column), tag1, tag2, etc then in Apache Ranger only column level tag will get sync:
      • (Database, table, column), tag1, tag2, etc.

File System

  • For folder or file, all the tag levels are allowed.
  • For field, only the same tag level is allowed.

Meta Tagging#

Meta tags applies on table or file level. It also gets sync to the Apache Ranger at table or file level. Only system-classified and manually classified tags get synced to Apache Ranger.

Post-processing Tags#

System Classified or Manually Classified tags which are applied through post processing rules gets sync in Apache Ranger.

Re-evaluate#

In case of re-evaluate, System Classified or Manually Classified datazone tags gets sync in Apache Ranger. Also, resources which are gets deleted through datazone policies will get remove from Apache Ranger as well.

Add/Edit Tag#

User is allowed to Add/Edit tag manually on original classified resources from following pages:

  • Classifications: Navigate to Data Inventory >> Classifications
  • Resource Detail: Navigate to Data Inventory >> Classifications >> Click on any Resource >> Resource Detail
  • Data Explorer: Navigate to Data Inventory >> Data Explorer
  • Data Zone Dashboard: Navigate to Compliance Workflow >> Data Zone Dashboard

When a user adds tags manually using above pages, by default tag status will be set as “Accepted:Manually classified” and it will get sync to Apache Ranger.

Add Resource#

User is allowed to manually add tags to the resources which are not classified earlier. When user adds such resources and add tag to it, by default tag status will be set as “Accepted:Manually classified” and it will get sync to Apache Ranger.

Navigation to add resource: Data Inventory >> Classifications >> +Add Resource

Tag Status Change#

Tag Status Change will affect TagSync, only System Classified and Manually Accepted tags will get sync in Apache Ranger. Following are few scenarios for tag status change:

  • If tag status changed from System Classified to Rejected or Allowed then tag will get remove from Apache Ranger.
  • If tag status changed from Manually Accepted to Allowed or Rejected then tag will get remove from Apache Ranger.
  • If tag status reset to System Classified from Rejected or Allowed then tag will get sync to Apache Ranger.
  • If tag status changed to Manually Classified from Rejected or Allowed then tag will get sync in Apache Ranger.
  • If status changed from System Classified to Manually Classified then synced tags in Apache Ranger remains as it is.

Removal of Tag#

Users can remove manually added tags if those are rejected by the user. If the user removes the tag from the resource using the Add/Edit oprion then the tag will get removed from Apache Ranger as soon as user reject it.

Removal of Resource#

If the resource is added manually and it has only Manually Classified tags then as soon as the user reject the last tag, the resource will get removed from the Apache Ranger.

If the resource has System Classified tags and the user reject the last tag, the resource will get removed from the Apache Ranger as last tagsync for the same resource will get removed.

Rescan of Same File#

  • If user rescan the same resource for which tags are already sync in Apache Ranger, if no changes made in the rules, datazone policies then tag sync will remain as it is.

  • If post-processing rules are disabled, while re-scanning the resource, post-processing tags will get removed in rescan of the file.

  • If datazone tag is disabled or resource removed from Datazone, then datazone tag will get removed from Apache Ranger.

  • If meta tag rule is disabled or meta tag is disabled, then meta tag will get removed from Apache Ranger during rescan of resource.

  • If status change applied before rescan of the file, as per status change tagsync will also affect.

Validate TagSync in Apache Ranger#

User is allowed to view the tags which are getting pushed to Apache Ranger using curl command as well as using Ranger tag utility script.

Curl Command

curl -i -L -k -u admin:${PRIVACERA_PASSWORD} -H "Content-type: application/json" -X GET https://${PRIVACERA_HOST}:6182/service/tags/resources/service/privacera_postgres

The above curl command will give the list of resources which are synced in Apache Ranger but the response of this curl command is not in readable format. Hence, it is recommended to use Ranger tag utility to check TagSync.

Ranger Tag Utility

The following is the Python script created to communicate all Ranger API methods. This will return the response in end-user readable format.

  • Run the following command to download required files:

    wget https://privacera.s3.amazonaws.com/public/pm-demo-data/ranger_tag_utility.py -O ranger_tag_utility.py
    
  • Download the file on local system and execute the following command to view the tagsync response.

    SSL Instance

    python3 ranger_tag_utility.py     --operation list_tags     --host ${PRIVACERA_HOST}    --port 6182     --username ${RANGER_USERNAME}     --password ${RANGER_PASSWORD}     --servicename privacera_redshift    --ssl True     --verifyssl False
    

    Non-SSL Instance

    python3 ranger_tag_utility.py     --operation list_tags     --host ${PRIVACERA_HOST}     --port 6080     --username ${RANGER_USERNAME}     --password ${RANGER_PASSWORD}     --servicename privacera_maprfs     --ssl True     --verifyssl False
    
  • (Optional) Change the service name as per the application.

    Output

    Received Tag Data for path : ['/testdir/sample_files/file_format/avro/test.avro'] => tags :: ['SSN', 'PERSON_NAME', 'AU_BAN', 'TEST_DATAZONE', 'POST_PROCESS']
    Received Tag Data for path : ['/testdir/sample_files/file_format/avro/test.snappy.avro'] => tags :: ['US_ADDRESS', 'SSN', 'US_PHONE_NUMBER', 'AU_BAN', 'PERSON_NAME', 'TEST_DATAZONE', 'POST_PROCESS']
    Received Tag Data for path : ['/testdir/sample_files/file_format/avro/test1.avro'] => tags :: ['SSN', 'US_PHONE_NUMBER', 'PERSON_NAME', 'US_ADDRESS', 'AU_BAN', 'TEST_DATAZONE', 'POST_PROCESS']
    Received Tag Data for path : ['/testdir/sample_files/file_format/avro/twitter.avro'] => tags :: ['PERSON_NAME', 'TEST_DATAZONE', 'POST_PROCESS']
    Received Tag Data for path : ['/testdir/sample_files/file_format/avro/twitter.snappy.avro'] => tags :: ['PERSON_NAME', 'TEST_DATAZONE', 'POST_PROCESS']