- PrivaceraCloud Release 4.5
- PrivaceraCloud User Guide
- PrivaceraCloud
- What is PrivaceraCloud?
- Getting Started with Privacera Cloud
- User Interface
- Dashboard
- Access Manager
- Discovery
- Usage statistics
- Encryption and Masking
- Privacera Encryption core ideas and terminology
- Encryption Schemes
- Encryption Schemes
- System Encryption Schemes Enabled by Default
- View Encryption Schemes
- Formats, Algorithms, and Scopes
- Record the Names of Schemes in Use and Do Not Delete Them
- System Encryption Schemes Enabled by Default
- Viewing the Encryption Schemes
- Formats, Algorithms, and Scopes
- Record the Names of Schemes in Use and Do Not Delete Them
- Encryption Schemes
- Presentation Schemes
- Masking schemes
- Create scheme policies on PrivaceraCloud
- Encryption formats, algorithms, and scopes
- Deprecated encryption formats, algorithms, and scopes
- PEG REST API on PrivaceraCloud
- PEG API Endpoint
- Request Summary for PrivaceraCloud
- Prerequisites
- Anatomy of a PEG API endpoint on PrivaceraCloud
- About constructing the datalist for /protect
- About deconstructing the response from /unprotect
- Example of data transformation with /unprotect and presentation scheme
- Example PEG REST API endpoints for PrivaceraCloud
- Audit details for PEG REST API accesses
- Make calls on behalf of another user on PrivaceraCloud
- Privacera Encryption UDF for masking in Databricks
- Privacera Encryption UDFs for Trino
- Syntax of Privacera Encryption UDFs for Trino
- Prerequisites for installing Privacera Crypto plug-in for Trino
- Variable values to obtain from Privacera
- Determine required paths to crypto jar and crypto.properties
- Download Privacera Crypto Jar
- Set variables in Trino etc/crypto.properties
- Restart Trino to register the Privacera Crypto UDFs for Trino
- Example queries to verify Privacera-supplied UDFs
- Azure AD setup
- Launch Pad
- Settings
- General functions in PrivaceraCloud settings
- Applications
- About applications
- Azure Data Lake Storage Gen 2 (ADLS)
- Athena
- Privacera Discovery with Cassandra
- Databricks
- Databricks SQL
- Dremio
- DynamoDB
- Elastic MapReduce from Amazon
- EMRFS S3
- Files
- File Explorer for Google Cloud Storage
- Glue
- Google BigQuery
- Kinesis
- Lambda
- Microsoft SQL Server
- MySQL for Discovery
- Open Source Spark
- Oracle for Discovery
- PostgreSQL
- Power BI
- Presto
- Redshift
- Redshift Spectrum
- Kinesis
- Snowflake
- Starburst Enterprise with PrivaceraCloud
- Starburst Enterprise Presto
- Trino
- Datasource
- User Management
- API Key
- About Account
- Statistics
- Help
- Apache Ranger API
- Reference
- Okta Setup for SAML-SSO
- Azure AD setup
- SCIM Server User-Provisioning
- AWS Access with IAM
- Access AWS S3 buckets from multiple AWS accounts
- Add UserInfo in S3 Requests sent via Dataserver
- EMR Native Ranger Integration with PrivaceraCloud
- Spark Properties
- Operational Status
- How-to
- Create CloudFormation Stack
- Enable Real-time Scanning of S3 Buckets
- Enable Discovery Realtime Scanning Using IAM Role
- How to configure multiple JSON Web Tokens (JWTs) for EMR
- Enable offline scanning on Azure Data Lake Storage Gen 2 (ADLS)
- Enable Real-time Scanning on Azure Data Lake Storage Gen 2 (ADLS)
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- PrivaceraCloud
- PrivaceraCloud Previews
- Privacera documentation changelog
Preview: Scan Generic Records with NER Model
Note
Contact Privacera Support to request enabling this feature.
For background, see Generic Models.
Based on Natural Language Processing (NLP), the Generic Named Entity Recognition (NER) model detects named entities like person name, organization, and location. This model is intended only for use with unstructured text files or unstructured fields in structured files as it works with contextual information present in the text surrounding the target tags.
If a structured file has fields with long sentences for which prediction is needed via NLP, you can set the UNSTRUCTURED_FIELD_IN_STRUCTURED_FILE_ENABLED
parameter to true
. However, setting this parameter to true
might result in reduced speed for classification. The time required for classification depends on the number of unstructured field records with five or more words.
Supported tags
Generic_NER_ML_MODEL supports the following tags:
PERSON_NAME
ORGANIZATION
LOCATION
ACCOUNT
ZipCode
Credit Card
EMAIL
US_DLICENSE
UK_US_PASSPORT
VIN
MEXICAN_CURP_NUMBER
MEXICAN_PASSPORT_NUMBER
SPAIN_SSN
SPAIN_PASSPORT
SPAIN_DRIVERS_LICENSE
SPAIN_DNI
CANADA_DRIVERS_LICENSE
CANADA_PASSPORT
CANADA_SIN
Tags
By default, tags supported by Generic_NER_ML_MODEL are not present on the portal UI. If you want your scans to detect and showcase these tags on the user portal, you need to add them explicitly under the Tags tab.
CANADA_DRIVERS_LICENSE
CANADA_PASSPORT
CANADA_SIN
MEXICAN_CURP_NUMBER
MEXICAN_PASSPORT_NUMBER
SPAIN_SSN
SPAIN_PASSPORT
SPAIN_DRIVERS_LICENSE
SPAIN_DNI
Parameter | Data Type | Default | Description |
---|---|---|---|
UNSTRUCTURED_FIELD_IN_STRUCTURED_FILE_ENABLED | Boolean | False | Setting this parameter to true enables scanning of unstructured fields or columns within structured files. |
NLP_WORD_PROXIMITY_LENGTH | Integer | 10 | This parameter sets the total length of words to be considered for contextual information around PII information. |
NLP_LOG_LEVEL | String | INFO | This parameter sets the log level in the background process used for NLP. |