Skip to content

Discovery#

This topic provides the list of custom properties that can be configured for the Discovery service. It covers how you can configure the custom properties in Privacera Manager (PM) CLI.

PM CLI Configuration#

To use a custom property from the properties table:

  1. Add the property to the following YML file in the custom-vars folder configured as per your environment.

    • vars.discovery.aws.yml
    • vars.discovery.azure.yml
    • vars.discovery.gcp.yml
  2. Run the following command:

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    

Properties Table#

Property Description Values Default Value
DISCOVERY_IMAGE_NAME      
DISCOVERY_IMAGE_TAG      
DISCOVERY_ENABLE Set it true to enable Discovery. true,false
USE_DATABRICKS_SPARK Enable to use Databricks Spark instead of Apache Spark. true,false  
DISCOVERY_INSTALL      
DISCOVERY_FS_PREFIX

For accessing the filesytem of the cloud storage service, do the following:

  • For AWS and GCP, set the filesystem prefix. s3a:// is the prefix for AWS, and gs:// for GCP.
  • For Azure, set the container name. A container name is associated with your Azure storage account and where the blobs are organized containing the data to be scanned.
  • s3a://
  • StorageContainerName
  • gs://
 
DISCOVERY_CLOUD_TYPE Set the cloud type used for the Discovery setup.
  • AWS
  • AZURE
  • GCP
 
DISCOVERY_TRUSTSTORE_PASSWORD      
AUTO_START_DATABRICKS_JOB      
DISCOVERY_REALTIME_ENABLE Set to true to enable real-time scan in Discovery. true,false false
DISCOVERY_MENU_ENABLE Set to true to enable Discovery menu on Privacera Portal. true,false false
DISCOVERY_LOG_LEVEL      
DISCOVERY_FOLDER_TAGGER_ENABLE      
DISCOVERY_STORE_SAMPLE_VALUES Whether any sample values should be stored for a column or field true,false false
DISCOVERY_MAX_SAMPLE_VALUES Maximum sample values stored for a column or field.    
DISCOVERY_ENCRYPT_SAMPLE_VALUES Whether the samples should be stored encrypted. true,false; false
DISCOVERY_STREAM_SUFFIX      
DISCOVERY_STREAM_TAGS      
DISCOVERY_TABLE_SUFFIX      
DISCOVERY_TABLE_TAGS      
DISCOVERY_BUCKET_NAME      
DISCOVERY_BUCKET_TAGS      
DISCOVERY_CREATE_NOSQL_TABLES      
DISCOVERY_GEN_TERRAFORM_NOSQL_TABLES

Set to true if you want to create Dynamodb tables using terraform.

Set to false to disable terraform and create the resource manually.

  true
DISCOVERY_CREATE_STREAMS      
DISCOVERY_GEN_TERRAFORM_STREAMS

Set to true if you want to create Kinesis streams using terraform.

Set to false to disable terraform and create the resource manually.

  true
DISCOVERY_CREATE_BUCKET      
DISCOVERY_GEN_TERRAFORM_BUCKET

Set to true if you want to create S3 bucket using terraform.

Set to false to disable terraform and create the resource manually.

  true
DISCOVERY_GEN_TERRAFORM_AZURE_ACCOUNT      
DISCOVERY_SPARK_DRIVER_MEMORY      
DISCOVERY_SPARK_EXECUTOR_MEMORY      
DISCOVERY_SPARK_DRIVER_CORES      
DISCOVERY_SPARK_EXECUTOR_CORES      
DISCOVERY_SPARK_EXECUTOR_INSTANCES      
DISCOVERY_CREATE_DEFAULT_APP_IN_PORTAL      
DISCOVERY_COSMOSDB_FILE_REPOSITORY_PATH      
DISCOVERY_COSMOSDB_DOCUMENT_SIZE_LIMIT      
DISCOVERY_COSMOSDB_OFFER_THROUGHPUT      
DISCOVERY_AWS_CLOUD_ASSUME_ROLE Property to enable/disable to grant Discovery access to AWS services to perform the scanning operation.   true
DISCOVERY_AWS_CLOUD_ASSUME_ROLE_ARN      
DISCOVERY_BUCKET_SQS_NAME Set this property if you want to set a custom name for a SQS queue.   privacera_bucket_sqs_{{DEPLOYMENT_ENV_NAME}}
DISCOVERY_SQS_TAGS      
DISCOVERY_CREATE_SQS      
DISCOVERY_GEN_TERRAFORM_SQS

Set to true if you want to create SQS resource using terraform.

Set to false to disable terraform and create the resource manually.

  true
DATABRICKS_INIT_DBFS_FOLDER      
DATABRICKS_DISCOVERY_CUST_CONF_ZIP_NAME      
DATABRICKS_DISCOVERY_INIT_SCRIPT_PATH      
PRIVACERA_DISCOVERY_DATABRICKS_DOWNLOAD_URL      
DATABRICKS_DISCOVERY_SPARK_VERSION The version of Spark used in a Databricks cluster.
  • 6.4.x-scala2.11 (Spark 2.4)
  • 7.3.x-scala2.12 (Spark 3.0)
  • 7.4.x-scala2.12 (Spark 3.0)
  • 7.5.x-scala2.12 (Spark 3.0)
  • 7.6.x-scala2.12 (Spark 3.0)
7.3.x-scala2.12
DISCOVERY_SPARK_TASK_SCHEDULER_ENABLE      
DISCOVERY_RANGER_REST_ENABLED      
DISCOVERY_K8S_IMAGE_NAME      
DISCOVERY_K8S_IMAGE_TAG      
DISCOVERY_K8S_IMAGE_PULL_POLICY      
DISCOVERY_K8S_PVC_NAME      
DISCOVERY_K8S_PVC_STORAGE_SIZE_MB      
DISCOVERY_K8S_PVC_STORAGE_SIZE      
DISCOVERY_K8S_STORAGE_PROVISIONER      
DISCOVERY_K8S_SC_NAME      
DISCOVERY_K8S_PV_ENCRYPTED      
DISCOVERY_K8S_PV_KEY      
DISCOVERY_K8S_LOADBALANCER_EXTERNAL      
DISCOVERY_K8S_ANNOTATION_LOADBALANCER_ANNOTATION      
DISCOVERY_K8S_SPARK_UI_PORT      
DISCOVERY_K8S_SPARK_UI_PORT_EXTERNAL Property to change the default port number for Discovery.   4040
DISCOVERY_K8S_SPARK_EVENT_LOG_ENABLED      
DISCOVERY_K8S_SPARK_DRIVER_PORT      
DISCOVERY_K8S_SPARK_BLOCKMANAGER_PORT      
DISCOVERY_K8S_SPARK_PORT_MAX_RETRIES      
DISCOVERY_K8S_SPARK_SERVICE_AC_NAME      
DISCOVERY_K8S_SPARK_DRIVER_MEMORY      
DISCOVERY_K8S_SPARK_EXECUTOR_MEMORY      
DISCOVERY_K8S_SPARK_DRIVER_CORES      
DISCOVERY_K8S_SPARK_EXECUTOR_CORES      
DISCOVERY_K8S_SPARK_EXECUTOR_INSTANCES      
DISCOVERY_K8S_SPARK_DRIVER_LIMIT_CORES      
DISCOVERY_K8S_SPARK_EXECUTOR_LIMIT_CORES      
DISCOVERY_K8S_SPARK_EXECUTOR_REQUEST_CORES      
DISCOVERY_K8S_SPARK_MASTER      
DISCOVERY_K8S_MEM_LIMITS      
DISCOVERY_K8S_MEM_REQUESTS      
DISCOVERY_K8S_CPU_LIMITS      
DISCOVERY_K8S_CPU_REQUESTS      
DISCOVERY_AZURE_APP_CLIENT_ID      
DISCOVERY_AZURE_STORAGE_ACCOUNT_NAME      
DISCOVERY_AZURE_URL_PREFIX      
DISCOVERY_AZURE_AUDIT_TYPE      
DISCOVERY_AZURE_LOCATION      
CREATE_AZURE_RESOURCES      
DISCOVERY_AZURE_RESOURCE_GROUP      
DISCOVERY_AZURE_APPLICATION_ID      
DISCOVERY_AZURE_TENANTID      
DISCOVERY_AZURE_APP_CLIENT_SECRET_BASE64      
DISCOVERY_AZURE_SUBSCRIPTION_ID      
DISCOVERY_AZURE_COSMOS_DB_ACCOUNT      
DISCOVERY_PORTAL_SERVICE_USERNAME      
DISCOVERY_PORTAL_SERVICE_PASSWORD      
DISCOVERY_CLOUD_MODE      
DISCOVERY_AWS_ENDPOINT_ENABLE      
DISCOVERY_KINESIS_ENDPOINT_URL      
DISCOVERY_DYNAMODB_ENDPOINT_URL      
DISCOVERY_SOLR_BASIC_AUTH_ENABLED      
DISCOVERY_SOLR_BASIC_AUTH_USER      
DISCOVERY_SOLR_BASIC_AUTH_PASSWORD      
PRIVACERA_DISCOVERY_SECRETS_FILE      
DISCOVERY_ENCRYPT_SECRETS      
PRIVACERA_DISCOVERY_SECRETS_KEYSTORE_PASSWORD      
DISCOVERY_ENCRYPT_PROPS_LIST      
DISCOVERY_PORTAL_SERVICE_PASSWORD      
PRIVACERA_DISCOVERY_DATASOURCE_PASSWORD      
RANGER_TAGSYNC_PASSWORD      
DISCOVERY_SOLR_BASIC_AUTH_PASSWORD      
PRIVACERA_DISCOVERY_DATASOURCE_PASSWORD      
DISCOVERY_FS_S3A_ACCCESS_KEY      
DISCOVERY_FS_S3A_SECRET_KEY      
DISCOVERY_CLUSTER_NAME      
DISCOVERY_AGENT_MODE      
DISCOVERY_LOGS_SOLR_ENABLE      
DISCOVERY_RANGER_HOOK_ENABLED      
DISCOVERY_SPARK_DOCKER_DRIVER_MEMORY      
DISCOVERY_SPARK_DOCKER_EXECUTOR_MEMORY      
DISCOVERY_SPARK_DOCKER_DRIVER_CORES      
DISCOVERY_SPARK_DOCKER_EXECUTOR_CORES      
DISCOVERY_SPARK_DOCKER_EXECUTOR_INSTANCES      
DISCOVERY_DOCKER_SPARK_MASTER      
DISCOVERY_OFFLINE_SCAN_DEBUG_ENABLED      
DISCOVERY_SCAN_BACKUP_CLEANER_INTERVAL_HR      
DISCOVERY_RTBF_POLICY_ENABLED      
DISCOVERY_WORKFLOW_POLICY_ENABLED      
DISCOVERY_WORKFLOW_EXPUNGE_POLICY_ENABLED      
DISCOVERY_DEIDENTIFICATION_POLICY_ENABLED      
DISCOVERY_CONTENT_SCANNING_ENABLED      
DISCOVERY_SCAN_OFFICE_MIME_TYPES_AS_ARCHIVE_ENABLED      
DISCOVERY_OFFLINE_SCAN_BACKUP_FOLDER      
DISCOVERY_DICT_BASE_PATH      
DISCOVERY_ML_BASE_PATH      
DISCOVERY_ML_TAG_ACTION_MODEL_PATH      
DISCOVERY_SCAN_REQUEST_FILES_DIR      
PARTIAL_MATCH_ENABLE      
DISCOVERY_COSMOSDB_URL      
DISCOVERY_COSMOSDB_KEY      
DISCOVERY_GEN_TERRAFORM_WITH_MSI_ROLE      
DISCOVERY_AZURE_HNS_ENALBED      
DISCOVERY_AZURE_ACCOUNT_REPLICATION_TYPE      
DISCOVERY_AZURE_ACCOUNT_KIND      
DISCOVERY_SAMPLE_VALUES_MAX_LENGTH Maximum length of a sample that is stored for a column or field    
DISCOVERY_S3_AUDITS_ENABLE      
DISCOVERY_ADLS_AUDITS_ENABLE      
DISCOVERY_GCS_AUDITS_ENABLE      
DISCOVERY_GBQ_AUDITS_ENABLE      
DISCOVERY_DEPLOYMENT_SUFFIX_ID      
DISCOVERY_SERVICE_USER      
DISCOVERY_VERSION_FILE_NAME      
DISCOVERY_HEARTBEAT_UPDATE_INTERVAL_SEC      
DISCOVERY_SCAN_BACKUP_CLEANER_THRESHOLD_HR      
DISCOVERY_LOOKUP_COPY_TO_HDFS_INTERVAL_SEC      
DISCOVERY_GENERATE_SRC_ALERT_INTERVAL_MIN      
DISCOVERY_LOOKUP_COPY_TO_HDFS_FROM_AGENT      
DISCOVERY_RETRY_ON_FAILURE_INTERVAL_SEC      
DISCOVERY_SCAN_DELAY_RETRY_INTERVAL      
DISCOVERY_SCAN_DELAY_RETRY_COUNT      
DISCOVERY_HOST      
DISCOVERY_KAFKA_HEARTBEAT_INTERVAL_MS      
DISCOVERY_KAFKA_REQUEST_TIMEOUT_MS      
DISCOVERY_KAFKA_SESSION_TIMEOUT_MS      
DISCOVERY_KAFKA_CONNECTIONS_MAX_IDLE_MS      
DISCOVERY_KAFKA_ENABLE_AUTO_COMMIT      
DISCOVERY_KAFKA_AUTO_OFFSET_RESET      
DISCOVERY_KERBEROS_ENABLE      
DISCOVERY_SOLR_KERBEROS_ENABLE      
DISCOVERY_HBASE_KERBEROS_ENABLE      
DISCOVERY_KAFKA_KERBEROS_ENABLE      
DISCOVERY_KERBEROS_RELOGIN_INTERVAL_SECS      
DISCOVERY_PORTAL_KERBEROS_ENABLE      
DISCOVERY_SCAN_WORKER_KAFKA_SEND_BUFFER_MEMORY      
DISCOVERY_SCAN_WORKER_KAFKA_SEND_LINGERMS      
DISCOVERY_SCAN_WORKER_KAFKA_SEND_BATCHSIZE      
DISCOVERY_SCAN_WORKER_KAFKA_SEND_RETRIES      
DISCOVERY_SOLR_COLLECTION      
DISCOVERY_SOLR_LINEAGE_COLLECTION      
DISCOVERY_SOLR_ALERT_COLLECTION      
DISCOVERY_SOLR_RESOURCE_COLLECTION      
DISCOVERY_SOLR_OFFLINE_SCAN_SUMMARY_COLLECTION      
DISCOVERY_SOLR_RESOURCE_META_INFO_COLLECTION      
DISCOVERY_SOLR_RESOURCE_AUDIT_COLLECTION      
DISCOVERY_SOLR_SPARK_EVENT_COLLECTION      
DISCOVERY_SOLR_OFFLINE_SCAN_CLEANUP_COLLECTION      
DISCOVERY_UNSTRUCTURED_VALUE_CHECKING_ENABLED      
DISCOVERY_NUM_TOKENS_FOR_UNSTRUCTURED_DATA_DETECTION      
DISCOVERY_SCAN_INCLUDE_PART_FILES_MAX_INDEX      
DISCOVERY_ACTIVE_SCAN_ENABLE      
DISCOVERY_SPARK_JOB_SCHEDULER_SLEEP_TIME_MS      
DISCOVERY_AMOUNT_ARRAYVALUES_EXTRACTED      
DISCOVERY_RECOVERY_SPARK_DEFAULT_POOL_NAME      
DISCOVERY_CONSUMER_RECORD_WAIT_TIMEOUT_MS      
DISCOVERY_CONSUMER_RECORD_BATCH_SIZE      
DISCOVERY_RECOVERY_RETRY_MAX      
DISCOVERY_GENERAL_CONSUMER_QUEUE_SIZE      
DISCOVERY_OFFLINE_CONSUMER_QUEUE_SIZE      
DISCOVERY_CONSUMER_RECORD_DB_PATHS      
DISCOVERY_CONSUMER_RECORD_HANDLER_THREAD_POOL_SIZE

Property to configure the thread pool size for handling the consumer records.

The property determines how many data source applications can be handled by the scheduler, so the property value should be more than the data source applications that are registered in an installation.

  100
DISCOVERY_SEND_CHILD_TO_EXCLUDE_RESOURCE_INFO_ENABLE      
DISCOVERY_DYNAMODB_WRITE_ITEM_MAX_SIZE      
DISCOVERY_DYNAMODB_WRITE_BATCH_SIZE      
DISCOVERY_DYNAMODB_READ_BATCH_SIZE      
DISCOVERY_DYNAMODB_CHILD_COLUMN_LIMIT      
DISCOVERY_AZURE_PAYLOAD_LIMIT      
DISCOVERY_METASTORE_PAYLOAD_TABLE      
DISCOVERY_METANAME_LEAF_ONLY      
DISCOVERY_SEND_SPARK_JOB_EVENT      
DISCOVERY_RESTART_ON_STUCK_JOBS      
DISCOVERY_START_SCRIPT      
DISCOVERY_DB_MAX_STATEMENTS      
DISCOVERY_DB_MAX_POOL_SIZE      
DISCOVERY_DB_ACQUIRE_INCREMENT      
DISCOVERY_DB_MIN_POOL_SIZE      
DISCOVERY_COSMOSDB_MAX_POOL_SIZE      
DISCOVERY_COSMOSDB_RETRY_INTERVAL_SEC      
DISCOVERY_COSMOSDB_MAX_RETRY      
DISCOVERY_COSMOSDB_DATABASE_NAME      
DISCOVERY_SAVE_ARCHIVE_FILES      
DISCOVERY_RTBF_USE_ENCRYPTION      
DISCOVERY_DATAZONE_MONITOR_OFF_PREMISE_SRC_ENABLE      
DISCOVERY_DATAZONE_RESOURCE_REEVALUATE_ENABLED      
DISCOVERY_SCAN_NEW_SCANNER_ENABLE      
DISCOVERY_RIGHT_TO_PRIVACY_THREAD_POOL_SIZE      
DISCOVERY_OFFLINE_SCAN_RETRY_COUNT      
DISCOVERY_OFFLINE_SCAN_AUTO_RETRY_ENABLE      
DISCOVERY_OFFLINE_FILE_AND_FOLDER_COUNTING_TASK_POLL_TIME_MS      
DISCOVERY_OFFLINE_FILE_AND_FOLDER_COUNTING_TASK_TIMEOUT_MS      
DISCOVERY_OFFLINE_SCAN_PARTITION_ENABLE      
DISCOVERY_MAX_DICT_WORD_TO_SENTENCE_RATIO      
DISCOVERY_APPLY_METANAME_DICT_TO_UNSTRUCT      
DISCOVERY_MAX_BYTES_FOR_WORKFLOW      
DISCOVERY_PRECORDS_PARQUET_VERSION      
DISCOVERY_UNSTRUCT_TAGS_FILENAME      
DISCOVERY_WORKFLOW_DUPLICATE_FILE_RETRY_MAX_ATTEMPTS      
DISCOVERY_WORKFLOW_EXPUNGE_SPARKDF_SINGLE_FILE      
DISCOVERY_WORKFLOW_EXPUNGE_SPARKDF_ENABLE      
DISCOVERY_CLOUD_USE_ASSUMEROLE      
DISCOVERY_GCP_CLOUD_OUTPUTWRITERS_ENABLE      
DISCOVERY_DROOLS_POOL_SIZE      
DISCOVERY_DROOLS_USE_POOL      
DISCOVERY_INVALID_HEADER_CHARS_PAT      
DISCOVERY_MAX_HEADER_LEN      
DISCOVERY_STRUCT_VALUE_FULL_MATCH_ENABLED      
DISCOVERY_CLASSIFIER_AUTO_CREATE_MANUAL_TAG      
DISCOVERY_HBASE_BACKUP_TTL_MS      
DISCOVERY_HBASE_BACKUP_TTL_ENABLE      
DISCOVERY_HBASE_CLIENT_SCANNER_TIMEOUT_MS      
DISCOVERY_EXCLUSION_CLEANER_SLEEP_MIN      
DISCOVERY_EXCLUSION_CLEANER_BATCH_SIZE      
DISCOVERY_EXCLUSION_CLEANER_ENABLE      
DISCOVERY_FOLDER_TAGGER_BATCH_SIZE      
DISCOVERY_FOLDER_TAGGER_BACKOFF_TIME_SEC      
DISCOVERY_FOLDER_TAGGER_SLEEP_TIME_MS      
DISCOVERY_CMD_SERVER_ENABLED      
DISCOVERY_CMD_SERVER_PORT      
DISCOVERY_RULE_ENGINE_ADJUST_SCORES      
DISCOVERY_NOUN_LIST_FILE      
DISCOVERY_SPARK_JOB_MAX_TIME_MS      
DISCOVERY_ClASSIFY_RECORD_MAPPER_TASK_POLL_TIME_MS      
DISCOVERY_ClASSIFY_RECORD_MAPPER_TASK_TIMEOUT_MS      
DISCOVERY_ATLAS_HOOK_MAPPER_TASK_POLL_TIME_MS      
DISCOVERY_ATLAS_HOOK_MAPPER_TASK_TIMEOUT_MS      
DISCOVERY_NAV_TO_PRIVACERA_MAPPER_TASK_POLL_TIME_MS      
DISCOVERY_NAV_TO_PRIVACERA_MAPPER_TASK_TIMEOUT_MS      
DISCOVERY_SCAN_DELAY_MAPPER_TASK_POLL_TIME_MS      
DISCOVERY_SCAN_DELAY_MAPPER_TASK_TIMEOUT_MS      
DISCOVERY_ADLS_AUDITS_MAPPER_TASK_POLL_TIME_MS      
DISCOVERY_ADLS_AUDITS_MAPPER_TASK_TIMEOUT_MS      
DISCOVERY_S3_AUDITS_MAPPER_TASK_POLL_TIME_MS      
DISCOVERY_S3_AUDITS_MAPPER_TASK_TIMEOUT_MS      
DISCOVERY_DYNAMODB_AUDITS_MAPPER_TASK_POLL_TIME_MS      
DISCOVERY_DYNAMODB_AUDITS_MAPPER_TASK_TIMEOUT_MS      
DISCOVERY_HIVE_AUDITS_MAPPER_TASK_POLL_TIME_MS      
DISCOVERY_HIVE_AUDITS_MAPPER_TASK_TIMEOUT_MS      
DISCOVERY_CONTENT_CLASSIFIER_MAPPER_TASK_POLL_TIME_MS      
DISCOVERY_CONTENT_ClASSIFIER_MAPPER_TASK_TIMEOUT_MS      
DISCOVERY_CONTENT_SCAN_WORKER_TOPIC_PARTITION      
DISCOVERY_CONTENT_SCAN_COLLECTOR_CYCLE_TIME_MS      
DISCOVERY_DEFAULT_SPARK_PARTITION_PERCENT      
DISCOVERY_USE_SPARK_PARTITION_CALC      
DISCOVERY_HIVE_PROXY_USER_FEATURE      
DISCOVERY_KERBEROS_LOGIN_RETRY_INTERVAL_MS      
DISCOVERY_KERBEROS_LOGIN_NUM_RETRIES      
DISCOVERY_LFS_USE_FILE_MONITOR      
DISCOVERY_LFS_USE_FILE_WATCHER      
DISCOVERY_OFFLINE_SCAN_CLEANUP_THREAD_POOL_SIZE      
DISCOVERY_OFFLINE_SCAN_THREAD_POOL_SIZE      
DISCOVERY_QUICK_SCAN_LIMIT      
DISCOVERY_QUICK_SCAN_ENABLE      
DISCOVERY_DO_HDFS_SCHEMA_MAPPING      
DISCOVERY_ALLOW_FUZZY_MATCH_TAGS      
DISCOVERY_EXEC_MIMETYPE_REMOVE_DEFAULTS      
DISCOVERY_DEV_TEST_MODE      
DISCOVERY_TRIGGER_FILE_PATH      
DISCOVERY_POST_PROCESS_DROOLS_RULES_FILENAME      
DISCOVERY_CLASSIFIER_RULES_UNSTRUCT_FILENAME      
DISCOVERY_CLASSIFIER_RULES_FILENAME      
DISCOVERY_CLASSIFIER_DROOLS_RULES_FILENAME      
DISCOVERY_CHAT_SCAN_SKIP_INVALID_JSON_OUTPUT      
DISCOVERY_UNSTRUCT_AS_SINGLE_LINE      
DISCOVERY_POST_PROCESS_DATA_KEYSCORE_THRESHOLD      
DISCOVERY_UNSTRUCTURED_DATA_KEYSCORE_THRESHOLD      
DISCOVERY_STRUCTURED_DATA_KEYSCORE_THRESHOLD      
DISCOVERY_USE_KEYSCORE_THRESHOLD      
DISCOVERY__ML_PYTHON_FILE      
DISCOVERY_ML_CONDA_ENV_PATH      
DISCOVERY_ML_NLP_ENABLED      
DISCOVERY_POST_PROCESS_RULE_ENGINE_ENABLED      
DISCOVERY_RULE_ENGINE_DO_FALLBACK      
DISCOVERY_RULE_DATABASE_ENABLED      
DISCOVERY_RULE_ENGINE_ENABLED      
DISCOVERY_RULE_ENGINE_DROOLS_ENABLED      
DISCOVERY_RESOURCE_META_SCAN_MAPPER_CHECK_TASK_ACTIVE_INTERVAL_TIME_MS      
DISCOVERY_RESOURCE_META_SCAN_MAPPER_TASK_POLL_TIME_MS      
DISCOVERY_RESOURCE_META_SCAN_MAPPER_TASK_TIMEOUT_MS      
DISCOVERY_SCHEMA_MAP_BASE_PATH      
DISCOVERY_OFFLINE_SCAN_KAFKA_ENABLE      
DISCOVERY_ML_ENABLE      
DISCOVERY_SAS_SUFFIXES      
DISCOVERY_ENABLE_SIMPLE_KAFKA_CONSUMER_FOR_AUDIT_PARSING      
DISCOVERY_ENABLE_KAFKA_CONSUMER_FOR_MAPR_AUDIT_PARSING      
DISCOVERY_ENABLE_KAFKA_CONSUMER_FOR_AUDIT_PARSING      
DISCOVERY_ZIP_LOOKUP_KEY      
DISCOVERY_GENERIC_ML_TYPE      
DISCOVERY_CORE_NLP_ML_TYPE      
DISCOVERY_PHONE_NUMBER_ML_TYPE      
DISCOVERY_GEO_LAT_LONG_ML_TYPE      
DISCOVERY_DOB_ML_TYPE      
DISCOVERY_VIN_ML_TYPE      
DISCOVERY_ITIN_ML_TYPE      
DISCOVERY_EIN_ML_TYPE      
DISCOVERY_SSN_ML_TYPE      
DISCOVERY_IMEI_ML_TYPE      
DISCOVERY_CC_ML_TYPE      
DISCOVERY_ZIP_ML_TYPE      
DISCOVERY_LFS_WATCHER_POLLTIME_MS      
DISCOVERY_LFS_CREATE_MAX_TIME_MS      
DISCOVERY_LFS_WATCHER_CACHE_SIZE      
DISCOVERY_LFS_WATCHER_ENABLE      
DISCOVERY_LFS_APP_TOPIC      
DISCOVERY_LFS_APP      
DISCOVERY_GOOGLE_BIGQUERY_PARSE_CTAS      
DISCOVERY_DYNAMODB_ENABLE      
DISCOVERY_FUZZY_SCORING_SENSE_CHECK_ENABLE      
DISCOVERY_FUZZY_SCORING_MIN_CUTOFF_SCORE      
DISCOVERY_ML_SRC_DETECT_MODEL_PATH      
DISCOVERY_ML_MODEL_PATH      
DISCOVERY_ML_CLASSIFY_TAG_ACTION_ENABLE      
DISCOVERY_ML_CLASSIFY_SRC_CODE_ENABLE      
DISCOVERY_ML_CLASSIFY_TAG_ENABLE      
DISCOVERY_ML_STORE_SCAN_RESULTS      
DISCOVERY_OUTPUTWRITERS_ENABLE      
DISCOVERY_DATABRICKS_SPARK_ENABLE      
DISCOVERY_KAFKA_PRODUCER_COMPRESSION_CODEC      
DISCOVERY_SET_REMOTE_USER      
DISCOVERY_STALE_DATA_RETRY_COUNT      
DISCOVERY_AUDITS_TO_SOLR_ENABLED      
DISCOVERY_ATLAS_HOOK_SIMPLE      
DISCOVERY_ATLAS_HOOK_ENABLED      
DISCOVERY_SPLUNK_ENABLE      
DISCOVERY_SPLUNK_PORT      
DISCOVERY_SPLUNK_ALERT_INDEX      
DISCOVERY_SPLUNK_SCHEME      
DISCOVERY_SPLUNK_HEC_SOURCE      
DISCOVERY_ANOMALY_SCHEDULAR_ENABLE      
DISCOVERY_MONITORING_SCHEDULAR_ENABLE      
DISCOVERY_METRICS_JVM      
DISCOVERY_METRICS_KAFKA_TOPIC      
DISCOVERY_METRICS_KAFKA_INTERVAL_SEC      
DISCOVERY_METRICS_ENABLE_KAFKA      
DISCOVERY_METRICS_GRAPHITE_INTERVAL_SEC      
DISCOVERY_METRICS_GRAPHITE_ENABLE      
DISCOVERY_METRICS_CONSOLE_INTERVAL_SEC      
DISCOVERY_METRICS_ENABLE_CONSOLE      
DISCOVERY_METRICS_CSV_INTERVAL_SEC      
DISCOVERY_METRICS_ENABLE_CSV      
DISCOVERY_METRICS_CSVPATH      
DISCOVERY_SOLR_LOGS_COLLECTION      
DISCOVERY_SOLR_METRICS_COLLECTION      
DISCOVERY_DB_CPDS_TEST_ONCHECKIN      
DISCOVERY_DB_CPDS_TEST_ONCHECKOUT      
DISCOVERY_DB_CPDS_IDLECONN_TEST_PERIOD_SEC      
DISCOVERY_DB_CPDS_TESTQUERY      
DISCOVERY_COMMON_EXCLUDE_RESOURCE_LIST      
DISCOVERY_CSV_USE_HEADER      
DISCOVERY_SCAN_MARK_LIMIT_BYTES      
DISCOVERY_SCAN_MIN_CSV_FIELDS      
DISCOVERY_SCAN_HIVE_MAX_COLS Maximum number of columns in a database table or fields in a structured file to be scanned. This can be overriden by using `record.max.fields` property at data source level.   2000
DISCOVERY_SCAN_HIVE_MAX_ROWS Maximum number of rows of a database table to be scanned.   500
DISCOVERY_SCAN_MAX_LINES Maximum number of records of a structured file to be scanned.   500
DISCOVERY_CONTENT_MAX_CHARACTER Maximum number of bytes in a column cell or field cell to be scanned.   1000
DISCOVERY_TIKA_MAX_BYTES Maximum number of bytes of an unstructured file to be scanned.   102400
DISCOVERY_MAX_TAG_SNIPPET_SAMPLE_VALUES Maximum number of samples to be captured for display in a tag.   3
DISCOVERY_QUICK_COUNT_THRESHOLD      
DISCOVERY_KAFKA_CLASSIFIEDINFO_MAX_POLL_RECORDS      
DISCOVERY_KAFKA_CLASSIFIEDINFO_SESSION_TIMEOUT_MS      
DISCOVERY_KAFKA_CLASSIFIEDINFO_REQUEST_TIMEOUT_MS      
DISCOVERY_META_SCANNING_ENABLE      
DISCOVERY_OFFLINE_SCAN_SUMMARY_SOLR_ENABLE      
DISCOVERY_METRICS_SOLR_ENABLE      
DISCOVERY_NON_NULL_REPORT_OUTPUT_PATH      
DISCOVERY_CLASSIFICATION_NON_NULL_COUNT_ENABLE      
DISCOVERY_KAFKA_TOPIC_ENCRYPTION      
DISCOVERY_KAFKA_TOPIC_DISCOVERY      
DISCOVERY_KAFKA_DISCOVERY      
DISCOVERY_KAFKA_DISCOVERY_REQUEST_TIMEOUT_MS      
DISCOVERY_KAFKA_DISCOVERY_BOOSTRAP_SERVERS      
DISCOVERY_KAFKA_DISCOVERY_USE_SSL      
DISCOVERY_KAFKA_DISCOVERY_USE_KERBEROS      
DISCOVERY_KAFKA_DISCOVERY_NAME      
DISCOVERY_KAFKA_DISCOVERY_GROUP_ID      
DISCOVERY_KAFKA_DISCOVERY_POLL_TIME_MS      
DISCOVERY_KAFKA_DISCOVERY_ENABLE      
DISCOVERY_IS_ATLAS_TAG_ENABLE      
DISCOVERY_ATLAS_HOOK_VERSION      
DISCOVERY_SCAN_RESOURCE_META_INFO_SOLR      
DISCOVERY_IS_ATLAS_ENABLE      
DISCOVERY_SPARK_STREAMING_RECEIVER_MAXRATE      
DISCOVERY_SPARK_STREAMING_CHECKPOINT      
DISCOVERY_SPARK_ENABLE_HIVE_SUPPORT      
DISCOVERY_SPARK_LOCAL_MASTER      
DISCOVERY_SPARK_APPLICATION_NAME      
DISCOVERY_PORTAL_API_SCORE_THRESHOLD      
DISCOVERY_PORTAL_API_APP_LIST      
DISCOVERY_PORTAL_API_SYSTEM_LIST      
DISCOVERY_KERBEROS_PRINCIPAL      
DISCOVERY_KAFKA_ALERT_REPLICATION      
DISCOVERY_KAFKA_GROUP_ID      
DISCOVERY_GRAPHITE_HOST      
DISCOVERY_KAFKA_CLASSFICATION_INFO_REPLICATION      
DISCOVERY_MONITORING_HDFS_INPUT_PATH      
DISCOVERY_KERBEROS_KEYTAB      
DISCOVERY_SCAN_WORKER_KAFKA_GROUP_ID      
DISCOVERY_SOLR_ALERTS_COLLECTION      
DISCOVERY_SOLR_CLASSIFICATION_COLLECTION      
DISCOVERY_GRAPHITE_PORT      
DISCOVERY_HIVE_METASTORE_USEJDBC      
DISCOVERY_INIT_CONTAINER_COMMAND_LIST You can provide a list of commands to download custom jars to a specified location inside the Discovery container. For example:
DISCOVERY_INIT_CONTAINER_COMMAND_LIST:
  - wget https://privacera/public/custom-1.jar -O /opt/privacera/discovery/libs/custom-1.jar
  - wget https://privacera/public/custom-2.jar -O /opt/privacera/discovery/libs/custom-2.jar
 
DISCOVERY_SCAN_PARQUET_ORC_FROM_ARCHIVE_ENABLE Property to enable/disable the scanning of ORC/Parquet files within a ZIP file. true, false false
DISCOVERY_SCAN_PARQUET_ORC_STREAM_FILE_SIZE_LIMIT Property to set the file size limit in megabytes (MB) on the ORC/Parquet files being scanned from the archive location. 5242880
DISCOVERY_SCAN_PARQUET_TEMP_FILE_FROM_ARCHIVE_ENABLE By default, Parquet files are stored in a temporary file within a zip file.
Set to true to scan the Parquet files from a temporary file.
Set to false to scan the Parquet files from a zip file stream.
true, false true
DISCOVERY_SCAN_ORC_TEMP_FILE_FROM_ARCHIVE_ENABLE By default, ORC files are stored in a temporary file within a zip file.
Set to true to scan the ORC files from a temporary file.
Set to false to scan the ORC files from a zip file stream.
true, false false
DISCOVERY_GOOGLE_CLOUD_STORAGE_LINEAGE_LOOPBACK_TIME_MS This property indicates time for GCS lineage loopback. - 3000
DISCOVERY_GOOGLE_CLOUD_STORAGE_LINEAGE_CUTOFF_TIME_MS This property indicates cut off time to wait for GCS log event for lineage. - 300000
DISCOVERY_GOOGLE_CLOUD_STORAGE_LINEAGE_CUTOFF_TIME_CHECK_INTERVAL_MS This property indicates fixed interval at which to check for delayed GCS lineage pending realtime file. - 30000
DISCOVERY_CONTENT_SCAN_THREAD_POOL_SIZE If you are scanning more than 2 datasource with different projects, then set this property as the number of projects you will be scanning in discovery. - 2
DISCOVERY_CONNECTION_TEST_INTERVAL_SEC The fixed interval in seconds at which all key Privacera internal components are checked. Status of the connection is sent to Portal. See Health Check Allowable value is non-zero integer number of seconds. Recommended short duration and not to exceed 900 seconds (15 minutes). 60
DISCOVERY_TELEMETRY_UPDATE_TO_SOLR Set to true to send telemetry to Apache Solr.

Set to false to not send telemetry to the Apache Solr.

The following telemetry is sent to Apache Solr:
  • Count of tags.
  • Count of resource scanned based on application and application type.
  • Scan amount based on application and application type.
  • Total compliance count and compliance count for individual policy.
true, false true
DISCOVERY_RTBF_SUMMARY_ENABLED Set this property to true to view the summary for RTP policy and Expunge policy on the UI for Auto Run jobs.
Set this property to false to not view the summary.
Although this property string contains "RTBF", the property relates to RTP.
true, false false
DISCOVERY_K8S_SPARK_DYNAMIC_ALLOCATION_ENABLED Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. true, false false
DISCOVERY_K8S_SPARK_DYNAMIC_ALLOCATION_SHUFFLE_TRACKING_ENABLED Enables shuffle file tracking for executors, which allows dynamic allocation without the need for an external shuffle service. This option will try to keep alive executors that are storing shuffle data for active jobs. true, false true
DISCOVERY_K8S_SPARK_DYNAMIC_ALLOCATION_EXECUTOR_IDLE_TIMEOUT If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. - 60s
DISCOVERY_K8S_SPARK_DYNAMIC_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. - 120s
DISCOVERY_K8S_SPARK_DYNAMIC_ALLOCATION_MAX_EXECUTORS Upper bound for the number of executors if dynamic allocation is enabled. - 4
DISCOVERY_K8S_SPARK_MEMORY_OVERHEAD_FACTOR This sets the Memory Overhead Factor that will allocate memory to non-JVM memory, which includes off-heap memory allocations, non-JVM tasks, and various systems processes. - 0.1
DISCOVERY_HBASE_RETRY_ON_FAILURE_COUNT Number of retries for Hbase connection. - 2
DISCOVERY_HBASE_WAIT_BETWEEN_RETRY_MS Wait time before retrying Hbase connection. - 100 ms (milliseconds)
Memory Variables
DISCOVERY_DRIVER_HEAP_MIN_MEMORY_MB Minimum Java Heap memory in MB used by Discovery Driver. For example, DISCOVERY_DRIVER_HEAP_MIN_MEMORY_MB: "1024"
DISCOVERY_DRIVER_HEAP_MIN_MEMORY Minimum Java Heap memory used by Discovery Driver. Setting this value will override DISCOVERY_DRIVER_HEAP_MIN_MEMORY_MB.  For example, DISCOVERY_DRIVER_HEAP_MIN_MEMORY: "1g"
DISCOVERY_DRIVER_HEAP_MAX_MEMORY_MB Maximum Java Heap memory in MB used by Discovery Driver. For example, DISCOVERY_DRIVER_HEAP_MAX_MEMORY_MB: "1024"
DISCOVERY_DRIVER_HEAP_MAX_MEMORY Maximum Java Heap memory used by Discovery Driver. Setting this value will override DISCOVERY_DRIVER_HEAP_MAX_MEMORY_MB.  For example, DISCOVERY_DRIVER_HEAP_MAX_MEMORY: "1g"
DISCOVERY_DRIVER_K8S_MEM_REQUESTS_MB Minimum amount of Kubernetes memory in MB to be requested by Discovery Driver. For example, DISCOVERY_DRIVER_K8S_MEM_REQUESTS_MB: "1024"
DISCOVERY_DRIVER_K8S_MEM_REQUESTS Minimum amount of Kubernetes memory to be used by Discovery Driver. Setting this value will override DISCOVERY_DRIVER_K8S_MEM_REQUESTS_MB.  For example, DISCOVERY_DRIVER_K8S_MEM_REQUESTS: "1G"
DISCOVERY_DRIVER_K8S_MEM_LIMITS_MB Maximum amount of Kubernetes memory to be requested by Discovery Driver. The value set in in this field will be considered as megabytes.  For example, DISCOVERY_DRIVER_K8S_MEM_LIMITS_MB: "1024"
DISCOVERY_DRIVER_K8S_MEM_LIMITS Maximum amount of Kubernetes memory to be used by Discovery Driver. Setting this value will override DISCOVERY_DRIVER_K8S_MEM_LIMITS_MB.  For example, DISCOVERY_DRIVER_K8S_MEM_LIMITS: "1G"
DISCOVERY_DRIVER_CPU_MIN Minimum amount of Kubernetes CPU to be requested by Discovery Driver.  For example, DISCOVERY_DRIVER_CPU_MIN: "0.5"
DISCOVERY_DRIVER_CPU_MAX Maximum amount of Kubernetes CPU to be used by Discovery Driver.  For example, DISCOVERY_DRIVER_CPU_MAX: "0.5"
DISCOVERY_EXECUTOR_HEAP_MIN_MEMORY_MB Minimum Java Heap memory in MB used by Discovery Executor. For example, DISCOVERY_EXECUTOR_HEAP_MIN_MEMORY_MB: "1024"
DISCOVERY_EXECUTOR_HEAP_MIN_MEMORY Minimum Java Heap memory used by Discovery Executor. Setting this value will override DISCOVERY_EXECUTOR_HEAP_MIN_MEMORY_MB. For example, DISCOVERY_EXECUTOR_HEAP_MIN_MEMORY: "1g"
DISCOVERY_EXECUTOR_HEAP_MAX_MEMORY_MB Maximum Java Heap memory in MB used by Discovery Executor. For example, DISCOVERY_EXECUTOR_HEAP_MAX_MEMORY_MB: "1024"
DISCOVERY_EXECUTOR_HEAP_MAX_MEMORY Maximum Java Heap memory used by Discovery Executor. Setting this value will override DISCOVERY_EXECUTOR_HEAP_MAX_MEMORY_MB. For example, DISCOVERY_EXECUTOR_HEAP_MAX_MEMORY: "1g"
DISCOVERY_EXECUTOR_K8S_MEM_REQUESTS_MB Minimum amount of kubernetes memory in MB to be requested by Discovery Executor. For example, DISCOVERY_EXECUTOR_K8S_MEM_REQUESTS_MB: "1024"
DISCOVERY_EXECUTOR_K8S_MEM_REQUESTS Minimum amount of kubernetes memory to be used by Discovery Executor. Setting this value will override DISCOVERY_EXECUTOR_K8S_MEM_REQUESTS_MB. For example, DISCOVERY_EXECUTOR_K8S_MEM_REQUESTS: "1G"
DISCOVERY_EXECUTOR_K8S_MEM_LIMITS_MB Maximum amount of kubernetes memory in MB to be requested by Discovery Executor. For example, DISCOVERY_EXECUTOR_K8S_MEM_LIMITS_MB: "1024"
DISCOVERY_EXECUTOR_K8S_MEM_LIMITS Maximum amount of kubernetes memory to be used by Discovery Executor. Setting this value will override DISCOVERY_EXECUTOR_K8S_MEM_LIMITS_MB. For example, DISCOVERY_EXECUTOR_K8S_MEM_LIMITS: "1G"
DISCOVERY_EXECUTOR_CPU_MIN Minimum amount of kubernetes CPU to be requested by Discovery Executor. For example, DISCOVERY_EXECUTOR_CPU_MIN: "0.5"
DISCOVERY_EXECUTOR_CPU_MAX Maximum amount of kubernetes CPU to be used by Discovery Executor. For example, DISCOVERY_EXECUTOR_CPU_MAX: "0.5"