Skip to content

Discovery#

This topic provides the list of custom properties that can be configured for the Discovery service. It covers how you can configure the custom properties in PM CLI and PM UI.

Configuration#

PM CLI#

To use a custom property from the table, just add it to the following YML file in the custom-vars folder configured as per your environment:

  • vars.discovery.aws.yml
  • vars.discovery.azure.yml
  • vars.discovery.gcp.yml

PM UI#

To use a custom property in the UI, do the following:

  1. On the PM UI, do one of the following:

    If you're on the Setup Environment page, navigate to the Setup Discovery > Configure Discovery > Custom tab.

    Or

    If you're on the PM UI Dashboard, navigate to Discovery > Config tab > Custom tab.

  2. Click Add Custom Property, and enter a property name and its value.

  3. Select the property type: text or password.

  4. Click Add.

Properties Table#

Property Description Values Default Value
DISCOVERY_ENABLE Set it true to enable Discovery. true,false
USE_DATABRICKS_SPARK Enable to use Databricks Spark instead of Apache Spark. true,false  
DISCOVERY_FS_PREFIX Set the prefix for accessing filesystem according to the cloud provider.
  • s3a:// (AWS)
  • StorageContainerName (Azure)
  • gs:// (GCP)
 
DISCOVERY_CLOUD_TYPE Set the cloud type used for the Discovery setup.
  • AWS
  • AZURE
  • GCP
 
DISCOVERY_REALTIME_ENABLE Set to true to enable real-time scan in Discovery. true,false false
DISCOVERY_MENU_ENABLE Set to true to enable Discovery menu on Privacera Portal. true,false false
DISCOVERY_STORE_SAMPLE_VALUES Whether any sample values should be stored for a column or field true,false false
DISCOVERY_MAX_SAMPLE_VALUES Maximum sample values stored for a column or field.    
DISCOVERY_ENCRYPT_SAMPLE_VALUES Whether the samples should be stored encrypted. true,false; false
DISCOVERY_GEN_TERRAFORM_NOSQL_TABLES

Set to true if you want to create Dynamodb tables using terraform during Privacera Manager update.

Set to false to disable terraform and create the resource manually.

  true
DISCOVERY_GEN_TERRAFORM_STREAMS

Set to true if you want to create Kinesis streams using terraform during Privacera Manager update.

Set to false to disable terraform and create the resource manually.

  true
DISCOVERY_GEN_TERRAFORM_BUCKET

Set to true if you want to create S3 bucket using terraform during Privacera Manager update.

Set to false to disable terraform and create the resource manually.

  true
DISCOVERY_AWS_CLOUD_ASSUME_ROLE Property to enable/disable to grant Discovery access to AWS services to perform the scanning operation..   true
DISCOVERY_BUCKET_SQS_NAME Set this property if you want to set a custom name for a SQS queue.   privacera_bucket_sqs_{{DEPLOYMENT_ENV_NAME}}
DISCOVERY_GEN_TERRAFORM_SQS

Set to true if you want to create SQS resource using terraform during Privacera Manager update.

Set to false to disable terraform and create the resource manually.

  true
DATABRICKS_DISCOVERY_SPARK_VERSION The version of Spark used in a Databricks cluster.
  • 6.4.x-scala2.11 (Spark 2.4)
  • 7.3.x-scala2.12 (Spark 3.0)
  • 7.4.x-scala2.12 (Spark 3.0)
  • 7.5.x-scala2.12 (Spark 3.0)
  • 7.6.x-scala2.12 (Spark 3.0)
7.3.x-scala2.12
DISCOVERY_K8S_SPARK_UI_PORT_EXTERNAL Property to change the default port number for Discovery.   4040
DISCOVERY_SAMPLE_VALUES_MAX_LENGTH Maximum length of a sample that is stored for a column or field    
DISCOVERY_CONSUMER_RECORD_HANDLER_THREAD_POOL_SIZE

Propety to configure the thread pool size for handling the consumer records.

The property determines how many data source applications can be handled by the scheduler, so the property value should be more than the data source applications that are registered in an installation.

  100
DISCOVERY_SCAN_HIVE_MAX_COLS Maximum number of columns in a database table or fields in a structured file to be scanned. This can be overriden by using `record.max.fields` property at data source level.   2000
DISCOVERY_SCAN_HIVE_MAX_ROWS Maximum number of rows of a database table to be scanned.   500
DISCOVERY_SCAN_MAX_LINES Maximum number of records of a structured file to be scanned.   500
DISCOVERY_CONTENT_MAX_CHARACTER Maximum number of bytes in a column cell or field cell to be scanned.   1000
DISCOVERY_TIKA_MAX_BYTES Maximum number of bytes of an unstructured file to be scanned.   102400
DISCOVERY_MAX_TAG_SNIPPET_SAMPLE_VALUES Maximum number of samples to be captured for display in a tag.   3
DISCOVERY_INIT_CONTAINER_COMMAND_LIST You can provide a list of commands to download custom jars to a desired location inside the Discovery container. For example:
DISCOVERY_INIT_CONTAINER_COMMAND_LIST:
  - wget https://privacera/public/custom-1.jar -O /opt/privacera/discovery/libs/custom-1.jar
  - wget https://privacera/public/custom-2.jar -O /opt/privacera/discovery/libs/custom-2.jar
 
DISCOVERY_SCAN_PARQUET_ORC_FROM_ARCHIVE_ENABLE Property to enable/disable the scanning of ORC/Parquet files within a ZIP file. true, false false
DISCOVERY_GOOGLE_CLOUD_STORAGE_LINEAGE_LOOPBACK_TIME_MS This property indicates time for GCS lineage loopback. - 3000
DISCOVERY_GOOGLE_CLOUD_STORAGE_LINEAGE_CUTOFF_TIME_MS This property indicates cut off time to wait for GCS log event for lineage. - 300000
DISCOVERY_GOOGLE_CLOUD_STORAGE_LINEAGE_CUTOFF_TIME_CHECK_INTERVAL_MS This property indicates fixed interval at which to check for delayed GCS lineage pending realtime file. - 30000
DISCOVERY_CONTENT_SCAN_THREAD_POOL_SIZE If you are scanning more than 2 datasource with different projects, then set this property as the number of projects you will be scanning in discovery. - 2
Memory Variables
DISCOVERY_DRIVER_HEAP_MIN_MEMORY_MB Minimum Java Heap memory in MB used by Discovery Driver. For example, DISCOVERY_DRIVER_HEAP_MIN_MEMORY_MB: "1024"
DISCOVERY_DRIVER_HEAP_MIN_MEMORY Minimum Java Heap memory used by Discovery Driver. Setting this value will override DISCOVERY_DRIVER_HEAP_MIN_MEMORY_MB.  For example, DISCOVERY_DRIVER_HEAP_MIN_MEMORY: "1g"
DISCOVERY_DRIVER_HEAP_MAX_MEMORY_MB Maximum Java Heap memory in MB used by Discovery Driver. For example, DISCOVERY_DRIVER_HEAP_MAX_MEMORY_MB: "1024"
DISCOVERY_DRIVER_HEAP_MAX_MEMORY Maximum Java Heap memory used by Discovery Driver. Setting this value will override DISCOVERY_DRIVER_HEAP_MAX_MEMORY_MB.  For example, DISCOVERY_DRIVER_HEAP_MAX_MEMORY: "1g"
DISCOVERY_DRIVER_K8S_MEM_REQUESTS_MB Minimum amount of Kubernetes memory in MB to be requested by Discovery Driver. For example, DISCOVERY_DRIVER_K8S_MEM_REQUESTS_MB: "1024"
DISCOVERY_DRIVER_K8S_MEM_REQUESTS Minimum amount of Kubernetes memory to be used by Discovery Driver. Setting this value will override DISCOVERY_DRIVER_K8S_MEM_REQUESTS_MB.  For example, DISCOVERY_DRIVER_K8S_MEM_REQUESTS: "1G"
DISCOVERY_DRIVER_K8S_MEM_LIMITS_MB Maximum amount of Kubernetes memory to be requested by Discovery Driver. The value set in in this field will be considered as megabytes.  For example, DISCOVERY_DRIVER_K8S_MEM_LIMITS_MB: "1024"
DISCOVERY_DRIVER_K8S_MEM_LIMITS Maximum amount of Kubernetes memory to be used by Discovery Driver. Setting this value will override DISCOVERY_DRIVER_K8S_MEM_LIMITS_MB.  For example, DISCOVERY_DRIVER_K8S_MEM_LIMITS: "1G"
DISCOVERY_DRIVER_CPU_MIN Minimum amount of Kubernetes CPU to be requested by Discovery Driver.  For example, DISCOVERY_DRIVER_CPU_MIN: "0.5"
DISCOVERY_DRIVER_CPU_MAX Maximum amount of Kubernetes CPU to be used by Discovery Driver.  For example, DISCOVERY_DRIVER_CPU_MAX: "0.5"
DISCOVERY_EXECUTOR_HEAP_MIN_MEMORY_MB Minimum Java Heap memory in MB used by Discovery Executor. For example, DISCOVERY_EXECUTOR_HEAP_MIN_MEMORY_MB: "1024"
DISCOVERY_EXECUTOR_HEAP_MIN_MEMORY Minimum Java Heap memory used by Discovery Executor. Setting this value will override DISCOVERY_EXECUTOR_HEAP_MIN_MEMORY_MB. For example, DISCOVERY_EXECUTOR_HEAP_MIN_MEMORY: "1g"
DISCOVERY_EXECUTOR_HEAP_MAX_MEMORY_MB Maximum Java Heap memory in MB used by Discovery Executor. For example, DISCOVERY_EXECUTOR_HEAP_MAX_MEMORY_MB: "1024"
DISCOVERY_EXECUTOR_HEAP_MAX_MEMORY Maximum Java Heap memory used by Discovery Executor. Setting this value will override DISCOVERY_EXECUTOR_HEAP_MAX_MEMORY_MB. For example, DISCOVERY_EXECUTOR_HEAP_MAX_MEMORY: "1g"
DISCOVERY_EXECUTOR_K8S_MEM_REQUESTS_MB Minimum amount of kubernetes memory in MB to be requested by Discovery Executor. For example, DISCOVERY_EXECUTOR_K8S_MEM_REQUESTS_MB: "1024"
DISCOVERY_EXECUTOR_K8S_MEM_REQUESTS Minimum amount of kubernetes memory to be used by Discovery Executor. Setting this value will override DISCOVERY_EXECUTOR_K8S_MEM_REQUESTS_MB. For example, DISCOVERY_EXECUTOR_K8S_MEM_REQUESTS: "1G"
DISCOVERY_EXECUTOR_K8S_MEM_LIMITS_MB Maximum amount of kubernetes memory in MB to be requested by Discovery Executor. For example, DISCOVERY_EXECUTOR_K8S_MEM_LIMITS_MB: "1024"
DISCOVERY_EXECUTOR_K8S_MEM_LIMITS Maximum amount of kubernetes memory to be used by Discovery Executor. Setting this value will override DISCOVERY_EXECUTOR_K8S_MEM_LIMITS_MB. For example, DISCOVERY_EXECUTOR_K8S_MEM_LIMITS: "1G"
DISCOVERY_EXECUTOR_CPU_MIN Minimum amount of kubernetes CPU to be requested by Discovery Executor. For example, DISCOVERY_EXECUTOR_CPU_MIN: "0.5"
DISCOVERY_EXECUTOR_CPU_MAX Maximum amount of kubernetes CPU to be used by Discovery Executor. For example, DISCOVERY_EXECUTOR_CPU_MAX: "0.5"

Last update: September 23, 2021