EXTRACT_ SEMANTIC_ CATEGORIES Function: International Tag Values¶
Attention
This behavior change is in the 2023_05 bundle.
For the current status of the bundle, refer to Bundle History.
The EXTRACT_SEMANTIC_CATEGORIES function behaves as follows:
- Previously:
The output of the function takes the following form:
- Currently:
The output of the function will change in its formatting, and the output will include support for SEMANTIC_CATEGORY tag values that pertain to Australia, Canada, the United Kingdom, and the United States. To support these countries, the tag values correspond to certain parent category groups. A parent category contains information about the classification result, including whether the column consists of values largely from one country or another.
The formatting changes are:
- Remove the
extra_infoandprobabilityfields. - Move the
alternatesfield to a different position in the output. - Add these new fields:
valid_value_ratio, which specifies the ratio of valid values in the sample size. Invalid values include NULL, an empty string, and a string with more than 256 characters.recommendation, which includes information about each tag and value.confidence, where the possible values are eitherHIGH,MEDIUM, orLOW.coverage, which indicates the percent of sampled cell values that match the rules for a particular category.details, which contains fields and values that can specify a geographical tag value for the SEMANTIC_CATEGORY tag.
For example:
The following table summarizes the relationship between the classification tags, new category groups and group members, and supported countries. The country codes are based on the ISO-3166-1 alpha-2 standard. Other semantic categories, such as EMAIL and GENDER, are not affected.
PRIVACY_CATEGORY Tag Values SEMANTIC_CATEGORY Tag Values (Parent Group) Group Members Country Code IDENTIFIERBANK_ACCOUNTCA_BANK_ACCOUNT
US_BANK_ACCOUNT
IBANCA
USORGANIZATION_IDENTIFIERAU_BUSINESS_NUMBER
AU_COMPANY_NUMBERAU DRIVERS_LICENSEAU_DRIVERS_LICENSE
CA_DRIVERS_LICENSE
US_DRIVERS_LICENSEAU
CA
USMEDICARE_NUMBERAU_MEDICARE_NUMBERAU PASSPORTAU_PASSPORT
CA_PASSPORT
US_PASSPORTAU
CA
USPHONE_NUMBERAU_PHONE_NUMBER
CA_PHONE_NUMBER
UK_PHONE_NUMBER
US_PHONE_NUMBERAU
CA
GB
USSTREET_ADDRESSCA_STREET_ADDRESS
US_STREET_ADDRESSCA
USTAX_IDENTIFIERAU_TAX_NUMBERAU NATIONAL_IDENTIFIERCA_SOCIAL_INSURANCE_NUMBER
UK_NATIONAL_INSURANCE_NUMBER
US_SSNCA
GB
USQUASI_IDENTIFIERCITYUS_CITY
CA_CITYUS
CAPOSTAL_CODEAU_POSTAL_CODE
CA_POSTAL_CODE
UK_POSTAL_CODE
US_POSTAL_CODEAU
CA
GB
USADMINISTRATIVE_AREA_1CA_PROVINCE_OR_TERRITORY
US_STATE_OR_TERRITORYCA
USADMINISTRATIVE_AREA_2US_COUNTYUS The data engineer can use the pending tag values by manually specifying the tag value in the ALTER TABLE or ALTER VIEW statement. Alternatively, the data engineer can call the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure to set the tag.
For example, use an ALTER TABLE statement to set the
PASSPORTtag value on the PASSPORT table column manually.There are no changes to the overall classification process or the steps to classify a table, all tables in a schema, or all tables in a database.
Tip
If you pass the EXTRACT_SEMANTIC_CATEGORIES function as an argument to the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure, be sure to double-check any custom handling that you might have configured to ensure that your workflows do not break due to the pending formatting changes.
- Remove the
Ref: 1110