EXTRACT_SEMANTIC_CATEGORIES Function: International Tag Values¶
Attention
This behavior change is in the 2023_05 bundle.
For the current status of the bundle, refer to Bundle History.
The EXTRACT_SEMANTIC_CATEGORIES function behaves as follows:
- Previously:
The output of the function takes the following form:
{ "<col1_name>": { "extra_info" : { "alternates" : [<semantic_categories>], "probability" : "<number>" }, "privacy_category" : "<value>", "semantic_category" : "<value>" }, ... ... "<colN_name>": { "extra_info" : { "alternates" : [<semantic_categories>], "probability" : "<number>" }, "privacy_category" : "<value>", "semantic_category" : "<value>" } }
The possible values for the SNOWFLAKE.CORE.SEMANTIC_CATEGORY tag are defined in this section, which does not specify SEMANTIC_CATEGORY tag values for countries other than the United States of America (i.e.
US_).- Currently:
The output of the function will change in its formatting, and the output will include support for SEMANTIC_CATEGORY tag values that pertain to Australia, Canada, the United Kingdom, and the United States. To support these countries, the tag values correspond to certain parent category groups. A parent category contains information about the classification result, including whether the column consists of values largely from one country or another.
The formatting changes are:
Remove the
extra_infoandprobabilityfields.Move the
alternatesfield to a different position in the output.Add these new fields:
valid_value_ratio, which specifies the ratio of valid values in the sample size. Invalid values include NULL, an empty string, and a string with more than 256 characters.recommendation, which includes information about each tag and value.confidence, where the possible values are eitherHIGH,MEDIUM, orLOW.coverage, which indicates the percent of sampled cell values that match the rules for a particular category.details, which contains fields and values that can specify a geographical tag value for the SEMANTIC_CATEGORY tag.
For example:
{ "valid_value_ratio": 1.0, "recommendation": { "semantic_category": "PASSPORT", "privacy_category": "IDENTIFIER", "confidence": "HIGH", "coverage": 0.7, "details": [ { "semantic_category": "US_PASSPORT", "coverage": 0.7 }, { "semantic_category": "CA_PASSPORT", "coverage": 0.1 } ] }, "alternates": [ { "semantic_category": "NATIONAL_IDENTIFIER", "privacy_category": "IDENTIFIER", "confidence": "LOW", "coverage": 0.3, "details": [ { "semantic_category": "US_SSN", "privacy_category": "IDENTIFIER", "coverage": 0.3 } ] } ] }
The following table summarizes the relationship between the classification tags, new category groups and group members, and supported countries. The country codes are based on the ISO-3166-1 alpha-2 (link removed) standard. Other semantic categories, such as EMAIL and GENDER, are not affected.
PRIVACY_CATEGORY Tag Values
SEMANTIC_CATEGORY Tag Values (Parent Group)
Group Members
Country Code
IDENTIFIERBANK_ACCOUNTCA_BANK_ACCOUNT.US_BANK_ACCOUNT.IBANCA . US
ORGANIZATION_IDENTIFIERAU_BUSINESS_NUMBER.AU_COMPANY_NUMBERAU
DRIVERS_LICENSEAU_DRIVERS_LICENSE.CA_DRIVERS_LICENSE.US_DRIVERS_LICENSEAU . CA . US
MEDICARE_NUMBERAU_MEDICARE_NUMBERAU
PASSPORTAU_PASSPORT.CA_PASSPORT.US_PASSPORTAU . CA . US
PHONE_NUMBERAU_PHONE_NUMBER.CA_PHONE_NUMBER.UK_PHONE_NUMBER.US_PHONE_NUMBERAU . CA . GB . US
STREET_ADDRESSCA_STREET_ADDRESS.US_STREET_ADDRESSCA . US
TAX_IDENTIFIERAU_TAX_NUMBERAU
NATIONAL_IDENTIFIERCA_SOCIAL_INSURANCE_NUMBER.UK_NATIONAL_INSURANCE_NUMBER.US_SSNCA . GB . US
QUASI_IDENTIFIERCITYUS_CITY.CA_CITY.US . CA .
POSTAL_CODEAU_POSTAL_CODE.CA_POSTAL_CODE.UK_POSTAL_CODE.US_POSTAL_CODEAU . CA . GB . US
ADMINISTRATIVE_AREA_1CA_PROVINCE_OR_TERRITORY.US_STATE_OR_TERRITORYCA . US
ADMINISTRATIVE_AREA_2US_COUNTYUS
The data engineer can use the pending tag values by manually specifying the tag value in the ALTER TABLE or ALTER VIEW statement. Alternatively, the data engineer can call the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure to set the tag.
For example, use an ALTER TABLE statement to set the
PASSPORTtag value on the PASSPORT table column manually.ALTER TABLE mydb.myschema.mytable MODIFY COLUMN passport SET TAG SNOWFLAKE.CORE.SEMANTIC_CATEGORY = 'PASSPORT';
There are no changes to the overall classification process or the steps to classify a table, all tables in a schema, or all tables in a database.
Tip
If you pass the EXTRACT_SEMANTIC_CATEGORIES function as an argument to the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure, be sure to double-check any custom handling that you might have configured to ensure that your workflows do not break due to the pending formatting changes.
Ref: 1110