Skip to content

Models

Models detect specific data elements in your data resources. The detection is done with various algorithms and heuristics.

Types of Models#

Privacera supports different types of models. You can filter the list of Model using the search model option. This tab also displays the present number of record count.

Generic Models#

These are various general model parameters you can use to tailor matching of data.

Parameter Data Type Default Description
INCLUDE_PATTERN_<#> String None Name of the patterns to be matched.

Can contain more than one pattern by incrementing the <#> variable. For example, INCLUDE_PATTERN_1, INCLUDE_PATTERN_2, INCLUDE_PATTERN_3.
EXCLUDE_PATTERN_<#> String None Name of the patterns to exclude from matching.

Can contain more than one pattern by incrementing the <#> variable. For example, EXCLUDE_PATTERN_1, EXCLUDE_PATTERN_2, EXCLUDE_PATTERN_3.
ONLY_DIGITS Boolean FALSE Indicates whether matching should use only the digits. Setting this parameter TRUE removes all non-numeric characters in the string before matching. For example, 1234-5 is treated as 12345.
CHECK_DIGIT_CODE_VALIDATE String None Indicates whether to evaluate a checksum digit based on the last digit. Valid values:
  • LUHN
  • ABA
  • CUSIP
  • DIHEDRAL
  • IBAN
  • UK_NHS
  • MOD11
  • ISBN10
DO_LOOKUP_FLAG Boolean FALSE Indicates whether to use patterns specified by the LOOKUP_PATTERN parameter. If this parameter is set to TRUE, the patterns specified in LOOKUP_PATTERN are used.
LOOKUP_DICT_KEY String None A dictionary name or key. See Dictionaries.
LOOKUP_PATTERN String None A pattern name or key from the defined patterns to use for matching. See Patterns.

Note: See Embed Patterns in Dictionaries below.
ISO3166_CC_VALIDATE_FLAG Boolean FALSE Indicates whether to use Privacera-defined matching to validate an ISO two-character country code. If this parameter is set to TRUE, ISO3166_CC_PATTERN is used.
ISO3166_CC_PATTERN None A valid pattern name or key used for matching country codes. See Patterns.

Note: See Embed Patterns in Dictionaries below.
ISO3166_CC_LOOKUP_KEY None Name of a defined dictionary. See Dictionaries.

Embed Patterns in Dictionaries#

Note

A future release will remove Discovery patterns from the left nav, because they are not used frequently. Instead, customers should now embed patterns in dictionaries. If you have any patterns in use, you should add them to a dictionary now. See Patterns and Dictionaries.

Credit Card#

The Credit Card model detects credit card numbers. It validates numbers based on the issuing network, length, and Luhn checksum.

Parameter Type Default Meaning
CC_PATTERN String Privacera-supplied pattern for credit-card numbers with range of digits, space or hyphen separated. Credit card pattern, if you want to override the supplied pattern.
DEFAULT_TYPES Boolean True Validate against known issuing network prefixes.
LUHN_CHECK Boolean True Validate the Luhn checksum on the credit card number.

Supported Credit Card Types#

Credit Card TypeConditionsExamples
American Express (AMEX) CardCredit card starting with 34 or 37 and having 15 digits.

34xxxxxxxxxxxxx

37xxxxxxxxxxxxx

Master Card
  • Credit card starting with 51 to 55 and having 14 digits
  • Credit card starting with 2221 and having 12 digits

  • Credit card starting with 27 and having 13 digits.
  • 51xxxxxxxxxxxx

    2221xxxxxxxx

    27xxxxxxxxxxx

    Visa CardCredit card starting with 4 and having 13 Or 16 digits.

    4xxxxxxxxxxxx

    4xxxxxxxxxxxxxxx

    Diners Club CardCredit card starting with 300 to 305 or 3095 or 36 or 38 or 39 and having 14 digits.

    300xxxxxxxxxxx

    3095xxxxxxxxxx

    VPay (Visa) CardCredit card starting with 4 and having 13 or 19 digits.

    4xxxxxxxxxxxx

    4xxxxxxxxxxxxxxxxxx

    Date of Birth#

    The Date of Birth model detects various date formats.

    Parameter Type Default Meaning
    MIN_AGE_YEARS Integer 5 Age lower threshold.
    MAX_AGE_YEARS Integer 100 Age upper threshold.
    USE_ALGO Boolean True Tagging is done based on an algorithm to detect random distribution.
    DATE_REGEX_var1 String Pattern that matches a custom date format var1.
    DATE_FORMAT_var1 String Date Format that matches the pattern for var1.

    Pre-configured date formats are:

    • International YYYYMD format with 4 digit year

    • US MDY with 4 digit or 2 digit year

    • Month abbreviated MDY

    Additional formats can be configured. For example, configure a regex and a Java date format:

    Parameter Type
    DATE_REGEX_1 \d{4} \d{2} \d{2}
    DATE_FORMAT_1 yyyy MM dd

    EIN#

    The EIN model detects Employer Identification Number using patterns and digit validation.

    Parameter Type Default Meaning
    EIN_PATTERN String Default EIN digit pattern if you want to override the default pattern.
    VALIDATIONS Boolean True Age upper threshold.
    STRICT_PATTERN Boolean True Allow match only if EIN has exact format.

    Geo Latitude and Longitude#

    The Geo model detects latitude and longitude coordinates. It can validate these values based on a geographical area.

    Parameter Type Default Meaning
    MIN_LAT Double US min latitude Lower limit (southern) on latitude.
    MAX_LAT Double US max latitude Upper limit (northern) on latitude.
    MIN_LONG Double US min longitude Lower limit (west) on longitude.
    MAX_LONG Double US max longitude Upper limit (east) on longitude.
    MIN_FRACTIONAL_DIGITS Integer 3 Minimum number of digits after the decimal point.

    IMEI#

    The IMEI model detects International Mobile Equipment Identity numbers that are used to identify mobile phones. It validates the Luhn checksum and the length of the IMEI.

    ITIN#

    The ITIN model detects Individual Tax Identifier Numbers (identifiers of individual taxpayers). It validates the format and digits of the ITIN.

    Parameter Type Default Meaning
    ITIN_PATTERN String Default ITIN digit pattern if you want to override the default pattern.
    STRICT_PATTERN Boolean True Allow match only if ITIN has exact format.

    MIME#

    The MIME model detects a file based on its Multipurpose Internet Mail Extensions type. The MIME type is detected using a combination of file extension and magic bytes in the header of the file. The detected MIME type is then looked up in a dictionary of MIME types.

    Parameter Type Default Meaning
    LOOKUP_DICT String Identifier of dictionary of MIME types.

    There are two pre-configured MIME model instances.

    • For detecting executable files: LOOKUP_DICT=EXEC_MIME_KEYWORD.

    • For detecting image files: LOOKUP_DICT=IMAGE_MIME_KEYWORD.

    Phone Number#

    The Phone Number model detects phone numbers. It validates the format of the phone numbers based on the country for which it is configured.

    Parameter Type Default Meaning
    COUNTRY_CODE String US Two-character country code.

    SSN#

    The SSN model detects US Social Security Numbers. It validates the format and checks against a blacklist of SSN numbers.

    Parameter Type Default Meaning
    SSN_PATTERN String Default Override the default SSN pattern.
    VALIDATIONS Boolean True Validate against known blacklist of SSNs.
    STRICT_PATTERN Boolean False Allow match only if SSN has exact format.
    USE_9_DIGIT_PATTERN Boolean False Match against any nine digit number without format.
    USE_4_DIGIT_PATTERN Boolean False Match against any four digit number without format. Disables validation with blacklist of SSN.
    STRICT_EXT_PATTERN Boolean True Allow match only if SSN has exact format that is hyphen-, dot-, or space-separated.

    Examples of Invalid SSNs#

    The SSN model would determine that the following SSNs are invalid.

    • SSN starting with 9 or 666 or 000 or 98765432.
    • SSN with 00 as the 4th and 5th digits.
    • SSN with 0000 as the sixth through ninth digits.
    • Any SSN like these:
      • 123456789
      • 111111111
      • 222222222
      • 333333333
      • 444444444
      • 555555555
      • 666666666
      • 777777777
      • 888888888
      • 999999999

    VIN#

    The VIN model detects Vehicle Identification Numbers. It validates the length and the VIN checksum.

    Zip#

    The Zip model detects US Zip codes. It detects both 5 digit and 5+4 digit variations and validates against a dictionary of US Zip codes.

    Parameter Type Default Meaning
    ZIP_DICT_KEY String US_ZIP_LOOKUP Key of the US Zip dictionary.
    ZIP_PATTERN String Default Validates content regular expression for list of ZIP codes.
    STRICT_PATTERN Boolean False Allow match only if Zip code has exact format. If set to true then only nine digits containing '-' and starting with five digits are considered a Zip code.

    Create Model#

    To create a model:

    1. On the Privacera home page, on the left, expand the Discovery menu and click on Models.

    2. Click + Add Model.

      The Create Model dialog is displayed.

    3. Enter the following details:

    4. Enter the model Name.

    5. Enter the model Description.
    6. Enter the model Key.
    7. Select the Type –, such as DOB_MODEL, CC_MODEL, and so on. See Types of Models.
    8. Select the Apply For –  File content or Metatype.
      • File content is resource content.
      • Meta type is database or column name and for HDFS.
    9. Select the Model Status. By default, enabled.
    10. Click + to add model properties.
    11. Enter the Key and Value. For example: Key: MIN_FRACTIONAL_DIGITS, Value: 2. You can add multiple model properties.

    12. Click Save.

    The model is created.

    Edit or Delete Model#

    You can edit or delete a model with the icons under the Actions column.

    To edit a model:

    1. In the Properties section, set the required property name and value.

    2. Click Save.

    The model is edited.

    Import Model#

    To import a model file in JSON format:

    1. In the Models home page, click the Import option.

      The Import dialog is displayed.

    2. Browse and select the JSON file and click Import.

    The model file is imported.

    Export Model#

    To export a model file in JSON format:

    1. In the Models page, click Export.

    2. Select the desired export option from drop-down. There are two ways to export:

      • All Records: Export the entire set of models. 

      • Select Records: Select the specific model to export. You can select multiple models.

    3. Click Export.

    The model file is exported.

    List of Privacera-supplied Models#

    The following is a list of the Privacera-supplied models. The name of a model in general describes the purpose of the model. For precise details, look at the model itself in the Platform UI.

    • DOB_ML_MODEL
    • CC_ML_MODEL
    • ZIP_ML_MODEL
    • ZIP_ML_MODEL
    • IMEI_ML_MODEL
    • SSN_ML_MODEL
    • EXEC_ML_MODEL
    • MIME_ML_MODEL
    • PHONE_NUMBER_ML_MODEL
    • GEO_LAT_LONG_ML_MODEL
    • CC_ML_MODEL_PROTECTED
    • EIN_ML_MODEL
    • ITIN_ML_MODEL
    • VIN_ML_MODEL
    • SSN_9_DIGIT_ML_MODEL
    • SSN_4_DIGIT_ML_MODEL
    • IMAGE_FILE_ML_MODEL
    • IMAGE_ML_MODEL

    Last update: August 24, 2021