Skip to content

Discovery#

This topic allows you to set up the AWS configuration for installing Privacera Discovery in a Docker and Kubernetes (EKS) environment.

IAM Policies#

To use the Privacera Discovery service, ensure the following IAM policies are attached to the Privacera_PM_Role role to access the AWS services.

Policy to create AWS resources

Policy to create AWS resources is required only during installation or when Discovery is updated through Privacera Manager. This policy gives permissions to Privacera Manager to create AWS resources like DynamoDB, Kinesis, SQS, and S3 using terraform.

  • ${AWS_REGION}: AWS region where the resources will get created.
{
"Version":"2012-10-17",
"Statement":[
    {
        "Sid":"CreateDynamodb",
        "Effect":"Allow",
        "Action":[
            "dynamodb:CreateTable",
            "dynamodb:DescribeTable",
            "dynamodb:ListTables",
            "dynamodb:TagResource",
            "dynamodb:UntagResource",
            "dynamodb:UpdateTable",
            "dynamodb:UpdateTableReplicaAutoScaling",
            "dynamodb:UpdateTimeToLive",
            "dynamodb:DescribeTimeToLive",
            "dynamodb:ListTagsOfResource",
            "dynamodb:DescribeContinuousBackups"
        ],
        "Resource":"arn:aws:dynamodb:${AWS_REGION}:*:table/privacera*"
    },
    {
        "Sid":"CreateKinesis",
        "Effect":"Allow",
        "Action":[
            "kinesis:CreateStream",
            "kinesis:ListStreams",
            "kinesis:UpdateShardCount"
        ],
        "Resource":"arn:aws:kinesis:${AWS_REGION}:*:stream/privacera*"
    },
    {
        "Sid":"CreateS3Bucket",
        "Effect":"Allow",
        "Action":[
            "s3:CreateBucket",
            "s3:ListAllMyBuckets",
            "s3:GetBucketLocation"

        ],
        "Resource":[
            "arn:aws:s3:::*"
        ]
    },
    {
        "Sid":"CreateSQSMessages",
        "Effect":"Allow",
        "Action":[
            "sqs:CreateQueue",
            "sqs:ListQueues"
        ],
        "Resource":[
            "arn:aws:sqs:${AWS_REGION}:${ACCOUNNT_ID}:privacera*"
        ]
    }
]
}
Policy to access AWS services

Policy to access AWS services is required once Discovery is installed, so that Discovery scan is able to access resources. This policy gives permissions to store Privacera's configuration and meta-data in DynamoDB, Kinesis, SQS, and S3.

  • ${PRIVACERA_BUCKET}: AWS Bucket used by Privacera to store its configuration files.

  • ${ACCOUNT_ID}: AWS account where the installation is being done.

  • ${AWS_REGION}: AWS region where resource will get created.

{
"Version":"2012-10-17",
"Statement":[
    {
        "Sid":"Dynamodb",
        "Effect":"Allow",
        "Action":[
            "dynamodb:BatchGet*",
            "dynamodb:DescribeStream",
            "dynamodb:DescribeTable",
            "dynamodb:Get*",
            "dynamodb:Query",
            "dynamodb:Scan",
            "dynamodb:BatchWrite*",
            "dynamodb:Update*",
            "dynamodb:PutItem"
        ],
        "Resource":"arn:aws:dynamodb:${AWS_REGION}:*:table/privacera*"
    },
    {
        "Sid":"Kinesis",
        "Effect":"Allow",
        "Action":[
            "kinesis:Get*",
            "kinesis:DescribeStreamSummary",
            "kinesis:ListStreams",
            "kinesis:PutRecord",
            "kinesis:AddTagsToStream",
            "kinesis:DecreaseStreamRetentionPeriod",
            "kinesis:DescribeLimits",
            "kinesis:DescribeStream",
            "kinesis:DescribeStreamConsumer",
            "kinesis:DescribeStreamSummary",
            "kinesis:GetShardIterator",
            "kinesis:IncreaseStreamRetentionPeriod",
            "kinesis:ListShards",
            "kinesis:ListStreamConsumers",
            "kinesis:ListStreams",
            "kinesis:ListTagsForStream",
            "kinesis:MergeShards",
            "kinesis:PutRecord",
            "kinesis:PutRecords",
            "kinesis:GetRecords",
            "kinesis:RegisterStreamConsumer"
        ],
        "Resource":"arn:aws:kinesis:${AWS_REGION}:*:stream/privacera*"
    },
    {
        "Sid":"S3BucketRead",
        "Effect":"Allow",
        "Action":[
            "s3:List*",
            "s3:Get*"
        ],
        "Resource":[
            "arn:aws:s3:::${BUCKET-TO-BE-SCANNED}",
            "arn:aws:s3:::${CONFIGURATION-BUCKET}",
            "arn:aws:s3:::${CUSTOMER_LANDING_BUCKET}",
            "arn:aws:s3:::${CUSTOMER_REALTIMESCAN_BUCKET}",
            "arn:aws:s3:::${CUSTOMER_QUARANTINE_BUCKET}",
            "arn:aws:s3:::${CUSTOMER_TRANSFER_BUCKET}",
            "arn:aws:s3:::${CUSTOMER_ARCHIVE_BUCKET}"
        ]
    },
    {
        "Sid":"S3ObjectAll",
        "Effect":"Allow",
        "Action":[
            "s3:PutObject",
            "s3:PutObjectAcl",
            "s3:GetObject",
            "s3:GetObjectAcl",
            "s3:DeleteObject",
            "s3:GetObjectVersion",
            "s3:DeleteObject",
            "s3:DeleteObjectVersion"
        ],
        "Resource":[
            "arn:aws:s3:::${BUCKET-TO-BE-SCANNED}/*",
            "arn:aws:s3:::${CONFIGURATION-BUCKET}/*",
            "arn:aws:s3:::${CUSTOMER_LANDING_BUCKET}/*",
            "arn:aws:s3:::${CUSTOMER_REALTIMESCAN_BUCKET}/*",
            "arn:aws:s3:::${CUSTOMER_QUARANTINE_BUCKET}/*",
            "arn:aws:s3:::${CUSTOMER_TRANSFER_BUCKET}/*",
            "arn:aws:s3:::${CUSTOMER_ARCHIVE_BUCKET}/*"
        ]
    },
    {
        "Sid":"S3GlobalRead",
        "Effect":"Allow",
        "Action":[
            "s3:ListAllMyBuckets"
        ],
        "Resource":[
            "arn:aws:s3:::${BUCKET-TO-BE-SCANNED}",
            "arn:aws:s3:::${CONFIGURATION-BUCKET}",
            "arn:aws:s3:::${CUSTOMER_LANDING_BUCKET}",
            "arn:aws:s3:::${CUSTOMER_REALTIMESCAN_BUCKET}",
            "arn:aws:s3:::${CUSTOMER_QUARANTINE_BUCKET}",
            "arn:aws:s3:::${CUSTOMER_TRANSFER_BUCKET}",
            "arn:aws:s3:::${CUSTOMER_ARCHIVE_BUCKET}"
        ]
    },
    {
        "Sid":"ManageSQSMessages",
        "Effect":"Allow",
        "Action":[
            "sqs:DeleteMessage",
            "sqs:ReceiveMessage"
        ],
        "Resource":[
            "arn:aws:sqs:${AWS_REGION}:${ACCOUNT_ID}:privacera*"
        ]
    }
]
}

CLI Configuration#

  1. SSH to the instance where Privacera is installed.

  2. Configure your environment.

    • Configure Discovery for a Kubernetes environment. You need to set the Kubernetes cluster name. For more information, see Discovery (Kubernetes Mode)

    • For a Docker environment, you can skip this step.

  3. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.discovery.aws.yml config/custom-vars/
    vi config/custom-vars/vars.discovery.aws.yml
    
  4. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    DISCOVERY_BUCKET_NAME: "<PLEASE_CHANGE>"
    

    To configure a bucket, add the property as follows, where bucket-1 is the name of the bucket:

    DISCOVERY_BUCKET_NAME: "bucket-1"
    

    To configure a bucket containing a folder, add the property as follows:

    DISCOVERY_BUCKET_NAME: "bucket-1/folder1"
    
  5. Uncomment/Add the following variable to enable Autoscalability of Executor pods:

    DISCOVERY_K8S_SPARK_DYNAMIC_ALLOCATION_ENABLED: "true"
    
  6. (Optional) If you want to customize Discovery configuration further, you can add custom Discovery properties. For more information, refer to Discovery Custom Properties.

    For example, by default, the username and password for the Discovery service is padmin/padmin. If you choose to change it, refer to Add Custom Properties.

  7. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    

Configuration Properties#

Property Description Example
DISCOVERY_BUCKET_NAME Set the bucket name where Discovery will store its metadata files container1
[Properties of Topic and Table names](../pm-ig/customize_topic_and_tables_names.md) Topic and Table names are assigned by default in Privacera Discovery. To customize any topic or table name, refer to the link.

Enable Realtime Scan#

An AWS SQS queue is required, if you want to enable realtime scan on the S3 bucket.

After running the PM update command, an SQS queue will be created for you automatically with the name, privacera_bucket_sqs_{{DEPLOYMENT_ENV_NAME}}, where {{DEPLOYMENT_ENV_NAME}} is the environment name you set in the vars.privacera.yml file. This queue name will appear in the list of queues of your AWS SQS account.

If you have an SQS queue which you want to use, add the DISCOVERY_BUCKET_SQS_NAME property in the vars.discovery.aws.yml file and assign your SQS queue name.

If you want to enable realtime scan on the bucket, click here.