- Platform Release 6.5
- Privacera Platform Installation
- About Privacera Manager (PM)
- Install overview
- Prerequisites
- Installation
- Default services configuration
- Component services configurations
- Access Management
- Data Server
- PolicySync
- Snowflake
- Redshift
- Redshift Spectrum
- PostgreSQL
- Microsoft SQL Server
- Databricks SQL
- RocksDB
- Google BigQuery
- Power BI
- UserSync
- Privacera Plugin
- Databricks
- Spark standalone
- Spark on EKS
- Trino Open Source
- Dremio
- AWS EMR
- AWS EMR with Native Apache Ranger
- GCP Dataproc
- Starburst Enterprise
- Privacera services (Data Assets)
- Audit Fluentd
- Grafana
- Access Request Manager (ARM)
- Ranger Tagsync
- Discovery
- Encryption & Masking
- Privacera Encryption Gateway (PEG) and Cryptography with Ranger KMS
- AWS S3 bucket encryption
- Ranger KMS
- AuthZ / AuthN
- Security
- Access Management
- Reference - Custom Properties
- Validation
- Additional Privacera Manager configurations
- CLI actions
- Debugging and logging
- Advanced service configuration
- Increase Privacera portal timeout for large requests
- Order of precedence in PolicySync filter
- Configure system properties
- PolicySync
- Databricks
- Table properties
- Upgrade Privacera Manager
- Troubleshooting
- Possible Errors and Solutions in Privacera Manager
-
- Unable to Connect to Docker
- Terminate Installation
- 6.5 Platform Installation fails with invalid apiVersion
- Ansible Kubernetes Module does not load
- Unable to connect to Kubernetes Cluster
- Common Errors/Warnings in YAML Config Files
- Delete old unused Privacera Docker images
- Unable to debug error for an Ansible task
- Unable to upgrade from 4.x to 5.x or 6.x due to Zookeeper snapshot issue
- Storage issue in Privacera UserSync & PolicySync
- Permission Denied Errors in PM Docker Installation
- Unable to initialize the Discovery Kubernetes pod
- Portal service
- Grafana service
- Audit server
- Audit Fluentd
- Privacera Plugin
-
- Possible Errors and Solutions in Privacera Manager
- How-to
- Appendix
- AWS topics
- AWS CLI
- AWS IAM
- Configure S3 for real-time scanning
- Install Docker and Docker compose (AWS-Linux-RHEL)
- AWS S3 MinIO quick setup
- Cross account IAM role for Databricks
- Integrate Privacera services in separate VPC
- Securely access S3 buckets ssing IAM roles
- Multiple AWS account support in Dataserver using Databricks
- Multiple AWS S3 IAM role support in Dataserver
- Azure topics
- GCP topics
- Kubernetes
- Microsoft SQL topics
- Snowflake configuration for PolicySync
- Create Azure resources
- Databricks
- Spark Plug-in
- Azure key vault
- Add custom properties
- Migrate Ranger KMS master key
- IAM policy for AWS controller
- Customize topic and table names
- Configure SSL for Privacera
- Configure Real-time scan across projects in GCP
- Upload custom SSL certificates
- Deployment size
- Service-level system properties
- PrestoSQL standalone installation
- AWS topics
- Privacera Platform User Guide
- Introduction to Privacera Platform
- Settings
- Data inventory
- Token generator
- System configuration
- Diagnostics
- Notifications
- How-to
- Privacera Discovery User Guide
- What is Discovery?
- Discovery Dashboard
- Scan Techniques
- Processing order of scan techniques
- Add and scan resources in a data source
- Start or cancel a scan
- Tags
- Dictionaries
- Patterns
- Scan status
- Data zone movement
- Models
- Disallowed Tags Policy
- Rules
- Types of rules
- Example rules and classifications
- Create a structured rule
- Create an unstructured rule
- Create a rule mapping
- Export rules and mappings
- Import rules and mappings
- Post-processing in real-time and offline scans
- Enable post-processing
- Example of post-processing rules on tags
- List of structured rules
- Supported scan file formats
- Data Source Scanning
- Data Inventory
- TagSync using Apache Ranger
- Compliance Workflow
- Data zones and workflow policies
- Workflow Policies
- Alerts Dashboard
- Data Zone Dashboard
- Data zone movement
- Example Workflow Usage
- Discovery health check
- Reports
- Built-in Reports
- Saved reports
- Offline reports
- Reports with the query builder
- How-to
- Privacera Encryption Guide
- Essential Privacera Encryption terminology
- Install Privacera Encryption
- Encryption Key Management
- Schemes
- Scheme Policies
- Encryption Schemes
- Presentation Schemes
- Masking schemes
- Encryption formats, algorithms, and scopes
- Deprecated encryption formats, algorithms, and scopes
- Encryption with PEG REST API
- PEG REST API on Privacera Platform
- PEG API Endpoint
- Encryption Endpoint Summary for Privacera Platform
- Authentication Methods on Privacera Platform
- Anatomy of the /protect API Endpoint on Privacera Platform
- About Constructing the datalist for protect
- About Deconstructing the datalist for unprotect
- Example of Data Transformation with /unprotect and Presentation Scheme
- Example PEG API endpoints
- /unprotect with masking scheme
- REST API Response Partial Success on Bulk Operations
- Audit Details for PEG REST API Accesses
- REST API Reference
- Make calls on behalf of another user
- Troubleshoot REST API Issues on Privacera Platform
- PEG REST API on Privacera Platform
- Encryption with Databricks, Hive, Streamsets, Trino
- Databricks UDFs for encryption and masking
- Hive UDFs
- Streamsets
- Trino UDFs
- Privacera Access Management User Guide
- Privacera Access Management
- How Polices are evaluated
- Resource policies
- Policies overview
- Creating Resource Based Policies
- Configure Policy with Attribute-Based Access Control
- Configuring Policy with Conditional Masking
- Tag Policies
- Entitlement
- Request Access
- Approve access requests
- Service Explorer
- User/Groups/Roles
- Permissions
- Reports
- Audit
- Security Zone
- Access Control using APIs
- AWS User Guide
- Overview of Privacera on AWS
- Set policies for AWS services
- Using Athena with data access server
- Using DynamoDB with data access server
- Databricks access manager policy
- Accessing Kinesis with data access server
- Accessing Firehose with Data Access Server
- EMR user guide
- AWS S3 bucket encryption
- S3 browser
- Getting started with Minio
- Plugins
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- Privacera documentation changelog
Google sink to Pub/Sub
Overview
This topic covers how to use a Sink based approach to read the real time audit logs for real time scanning in Pkafka for Discovery, instead of using the Cloud logging API. The following are key advantages of Sink based approach:
All the logs will be synchronized to a Sink.
Sinks are exported to a destination Pub/Sub topic.
Pkafka subscribes to the Pub/Sub topic and it will read the audit data from the topic and will pass on the Privacera topic and a real time scan will be triggered.
Summary of configuration steps
You need to create following resources on Google Cloud Console:
Destination to write logs from Sink: Following destination are available to write logs from Sink:
a. Cloud Storage
b. Pub/Sub Topic
c. Big Query
In this document, Pub/Sub Topic is considered as a destination for a Sink.
Create Pub/Sub topic
Log on to Google Cloud Console and navigate to Pub/Sub topics page.
Click the + CREATE TOPIC.
In the Create a topic dialog, enter the following details:
Enter the unique topic name in the Topic ID field. For example, DiscoverySinkTopic.
Select Add a default subscription checkbox.
Click CREATE TOPIC.
If required, you can create a subscription in a later stage, after creating the topic, by navigating to Topic > Create Subscription > Create a simple subscription.
Note down the subscription name as it will be used inside a property in Discovery.
If you created a default subscription, or created a new subscription, you need to change the following properties:
Acknowledgement deadline: Set as 600.
Retry policy: Select as Retry after exponential backoff delay and enter the following values:
Minimum backoff(seconds):
10
Maximum backoff (seconds):
600
Click Update.
Notice
You can configure GCS lineage time using custom properties, that are not readly apparent by default. See Properties Table
Create a Sink
Login to the Google Cloud Console and navigate to the Logs Router page. You can perform the above action using the Logs Explorer page as well by navigating to Actions > Create Sink.
Click CREATE SINK.
Enter Sink details:
a. Sink name (Required: Enter the identifier for Sink.
b. Sink description (Optional): Describe the purpose, or use case for the Sink.
c. Click NEXT.
Now, enter Sink destination:
a. Select Sink service.
b. Select the service where you want your logs routed. The following services and destinations are available:
Cloud Logging logs bucket: Select or create a Logs Bucket.
BigQuery: Select or create the particular dataset to receive the exported logs. You also have the option to use partitioned tables.
Cloud Storage: Select or create the particular Cloud Storage bucket to receive the exported logs.
Pub/Sub: Select or create the particular topic to receive the exported logs.
Splunk: Select the Pub/Sub topic for your Splunk service.
Select as Other Project: Enter the Google Cloud service and destination in the following format:
SERVICE.googleapis.com/projects/PROJECT_ID/DESTINATION/DESTINATION_ID
For example, if your export destination is a Pub/Sub topic, then the Sink destination will be as following:
pubsub.googleapis.com/projects/google_sample_project/topics/sink_new
Choose which logs to include in the Sink:
Build an inclusion filter: Enter a filter to select the logs that you want to be routed to the Sink's destination. For example:
(resource.type="gcs_bucket" AND resource.labels.bucket_name="bucket-to-be-scanned" AND (protoPayload.methodName="storage.objects.create" OR protoPayload.methodName="storage.objects.delete" OR protoPayload.methodName="storage.objects.get")) OR resource.type="bigquery_resource"
Add all of the bucket names you want to scan in the above filter as resources in Discovery.
bucket_name="bucket-to-be-scanned" AND
In case of multiple buckets, you will need to specify it as an “OR” condition, for example:
(resource.type="gcs_bucket" AND resource.labels.bucket_name="bucket_1" OR resource.labels.bucket_name="bucket_2" OR resource.labels.bucket_name="bucket_3"
In above example, three buckets are identified to be scanned -
bucket_1
,bucket_2
,bucket_3
.Click DONE.
Cross Project
For cross project scanning of GCS & GBQ resources, you need to create a Sink in another project and add the destination as a Pub/Sub topic of project one.
You can refer to the same step as mentioned above for creating the Sink in the destination by navigating to Destination > Select as Other project and enter the Pub/Sub topic name in the following format:
'pubsub.googleapis.com/projects/google_sample_project/topics/sink_new'
To access the Sink created in another project, you need to add the Sink writer identity service account in the IAM administration page of the project where you have the Pub/Sub topic and the VM instance present.
To get the Sink Writer Identity, perform the following steps:
Go to the Logs Router page > select the Sink > select the dots icon > select Edit Sink Details > Writer Identity section, copy the service account.
Go to the IAM Administration page of the project where you have the Pub/Sub Topic and the VM instance > select Add member > Add the service account of the Writer Identity of the Sink created above.
Choose the role Owner and Editor
Click Save. Verify whether the service account which you added is present as a member on the IAM Administration page.
Configure properties
Add the following properties to the file:
vars.pkafka.gcp.yml
PKAFKA_USE_GCP_LOG_SINK_API: "true" PKAFKA_GCP_SINK_DESTINATION_PUBSUB_SUBSCRIPTION_NAME: ""
For the above property, add the Subscription name as the value created in the Pub/Sub Topic.
Note that Subscription ID can be used as the value of the above property. Refer to the following screenshot for more information.
