Using SQL to automatically generate object descriptions

The Cortex Powered Object Descriptions feature allows you to use the Snowflake Cortex COMPLETE function to automatically generate descriptions for tables, views, and columns. This feature leverages Snowflake-hosted large language models (LLMs) to evaluate object metadata and, if desired, sample data to generate a description.

This topic describes how to use a stored procedure to generate descriptions programmatically. For information about using Snowsight to generate the descriptions, see Generate descriptions with Snowflake Cortex.

Generating a description

The AI_GENERATE_TABLE_DESC stored procedure automatically generates a description for a table and view. It can also generate descriptions for the columns of that table or view.

The AI_GENERATE_TABLE_DESC stored procedure accepts two arguments:

  • The name of the table or view that you want to generate a description for.

  • An optional configuration object that allows you to do the following:

    • Generate descriptions for the columns of the specified table or view.

    • Use sample data from the table or view to potentially improve the accuracy of the column descriptions.

Example: Generate a table description
CALL AI_GENERATE_TABLE_DESC( 'my_table');
Copy
Example: Generate table and column descriptions without using sample data
CALL AI_GENERATE_TABLE_DESC(
  'mydb.sch1.hr_data',
  {
    'describe_columns': true,
    'use_table_data': false
  });
Copy
Example: Generate view and column descriptions using sample data to improve accuracy
CALL AI_GENERATE_TABLE_DESC(
  'mydb.sch1.v1',
  {
    'describe_columns': true,
    'use_table_data': true
  });
Copy

For the complete syntax of the stored procedure, see AI_GENERATE_TABLE_DESC.

Working with the response

The AI_GENERATE_TABLE_DESC stored procedure returns a JSON object that contains the generated descriptions along with general information about the table and columns. Within this object, the description field contains the generated description.

Suppose you created the following table:

CREATE OR REPLACE TABLE mydb.sch1.hr_data (fname VARCHAR, age INTEGER);

INSERT INTO hr_data (fname, age)
    VALUES
        ('Thomas',    44),
        ('Katherine', 29),
        ('Lisa',      29);
Copy

Given this table, the following is an example of the JSON object returned by AI_GENERATE_TABLE_DESC:

{
  "COLUMNS": [
    {
      "database_name": "mydb",
      "description": "The first name of the employee.",
      "name": "FNAME",
      "schema_name": "sch1",
      "table_name": "hr_data"
    }
    {
      "database_name": "mydb",
      "description": "A column holding data of type DecimalType representing age values.",
      "name": "AGE",
      "schema_name": "sch1",
      "table_name": "hr_data"
    },
  ],
  "TABLE": [
    {
      "database_name": "mydb",
      "description": " The table contains records of employee data, specifically demographic information. Each record includes an employee's age and name.",
      "name": "hr_data",
      "schema_name": "sch1"
    }
  ]
}

For more information about each JSON field, see Returns.

Set generated descriptions as comments

To set a generated description as a comment on a table, view, or column, you must manually execute a SQL statement that includes the SET COMMENT parameter. For example, to save a generated description for a table t1, execute ALTER TABLE t1 SET COMMENT = 'ai generated description';.

You can write custom code to automatically generate and save descriptions. For examples of stored procedures that do this, see Examples.

Access control requirements

Users must have the following privileges and roles to call the AI_GENERATE_TABLE_DESC stored procedure:

  • SELECT privilege on the table or view.

  • SNOWFLAKE.CORTEX_USER database role.

Availability of the feature

Your region must support the LLM used by Snowflake Cortex (like Mistral-7b and Llama 3.1-8b) to generate the descriptions. Check the availability of the COMPLETE function. If the COMPLETE function is not supported in your region, you must enable cross-region inference to use the feature.

Using sample data

When generating a description for a column, you can rely only on metadata, or you can choose to use sample data to improve the Snowflake Cortex Powered Description. Sample data refers to data within a particular column that is evaluated when you use Snowflake Cortex to generate descriptions. If you choose to use sample data, Snowflake uses a portion of the sample data to generate the description, which leads to more accurate descriptions. Sample data is not stored by Snowflake as Usage Data.

Cost considerations

Generating descriptions incurs the following costs:

  • Credits consumed by the warehouse in use.

  • Credits charged for the use of Snowflake Cortex with smaller LLMs like Mistral-7b and Llama 3.1-8b. These charges appear on a bill as AI-Services, which includes all uses of Snowflake Cortex.

Limitations

You cannot generate column descriptions for objects with more than 1,000 columns.

Examples

Example: Generate descriptions and set them as comments

Step 1: Create a stored procedure

The following stored procedure does the following:

  • Automatically generates descriptions for all tables (and their columns) in a schema.

  • Sets these descriptions as comments on the tables and columns.

CREATE OR REPLACE PROCEDURE DESCRIBE_TABLES_SET_COMMENT (schema_name STRING,
  set_table_comment BOOLEAN,
  set_column_comment BOOLEAN)
  RETURNS STRING
  LANGUAGE PYTHON
  RUNTIME_VERSION = '3.10'
  PACKAGES=('snowflake-snowpark-python','joblib')
  HANDLER = 'main'
AS
$$
import json
from joblib import Parallel, delayed
import multiprocessing

def generate_descr(session, schema_name, table, set_table_comment, set_column_comment):
  table_name =  table['TABLE_NAME']
  async_job = session.sql(f"CALL AI_GENERATE_TABLE_DESC( '{schema_name}.{table_name}',{{'describe_columns': true, 'use_table_data': true}})").collect_nowait()
  result = async_job.result()
  output = json.loads(result[0][0])
  columns_ret = output["COLUMNS"]
  table_ret = output["TABLE"][0]

  table_description = table_ret["description"]
  table_name = table_ret["name"]
  database_name = table_ret["database_name"]
  schema_name = table_ret["schema_name"]

  if (set_table_comment):
      table_description = table_description.replace("'", "\\'")
      session.sql(f"""ALTER TABLE {database_name}.{schema_name}.{table_name} SET COMMENT = '{table_description}'""").collect()

  for column in columns_ret:
      column_description = column["description"];
      column_name = column["name"];
      if (set_column_comment):
          column_description = column_description.replace("'", "\\'")
          session.sql(f"""ALTER TABLE  {database_name}.{schema_name}.{table_name} MODIFY COLUMN {column_name}  COMMENT '{column_description}'""").collect()

  return 'Success';

def main(session, schema_name, set_table_comment, set_column_comment):

    schema_name = schema_name.upper()
    tablenames = session.sql(f"""SELECT table_name
                      FROM information_schema.tables
                      WHERE table_schema = '{schema_name}'
                      AND table_type = 'BASE TABLE'""").collect()
    try:
        Parallel(n_jobs=multiprocessing.cpu_count(), backend="threading")(
                delayed(generate_descr)(
                    session,
                    schema_name,
                    table,
                    set_table_comment,
                    set_column_comment,
                ) for table in tablenames
            )
        return 'Success'
    except Exception as e:
        # Catch and return the error message
        return f"An error occurred: {str(e)}"
$$;
Copy

Step 2: Call the stored procedure

Assuming your schema is named sch1, call the stored procedure as follows to generate descriptions for both tables and columns:

CALL describe_tables_set_comment('sch1', true, true);
Copy

You can run a DESC TABLE command to verify that the generated descriptions were set as comments on a table.

Example: Generate descriptions and save them to a catalog table

Step 1: Create a stored procedure

The following stored procedure does the following:

  • Automatically generates descriptions for all tables (and their columns) in a schema.

  • Populates a catalog table, where each row represents a table or column with its generated description.

CREATE OR REPLACE PROCEDURE DESCRIBE_TABLES_SET_CATALOG (
  schema_name STRING,
  catalog_table STRING)
  RETURNS STRING
  LANGUAGE PYTHON
  RUNTIME_VERSION = '3.10'
  PACKAGES=('snowflake-snowpark-python','joblib')
  HANDLER = 'main'
AS
$$
import json
from joblib import Parallel, delayed
import multiprocessing

def generate_descr(session, table, catalog_table):
    table_name =  table['TABLE_NAME']
    async_job = session.sql(f"CALL AI_GENERATE_TABLE_DESC( '{table_name}',{{'describe_columns': true, 'use_table_data': true}})").collect_nowait()
    result = async_job.result()
    output = json.loads(result[0][0])
    columns_ret = output["COLUMNS"]
    table_ret = output["TABLE"][0]

    table_description = table_ret["description"]
    table_name = table_ret["name"]
    database_name = table_ret["database_name"]
    schema_name = table_ret["schema_name"]

    session.sql(f"""INSERT INTO {catalog_table} (domain, description, name, database_name, schema_name, table_name)
                          VALUES ('TABLE', '{table_description}', '{table_name}', '{database_name}', '{schema_name}', null)""").collect()

    for column in columns_ret:
        column_description = column["description"];
        column_name = column["name"];
        session.sql(f"""INSERT INTO {catalog_table} (domain, description, name, database_name, schema_name, table_name)
                          VALUES ('COLUMN', '{column_description}', '{column_name}', '{database_name}', '{schema_name}', '{table_name}')""").collect()

    return 'Success';

def main(session, schema_name, catalog_table):

    schema_name = schema_name.upper()
    catalog_table_upper = catalog_table.upper()
    tablenames = session.sql(f"""SELECT table_name
                      FROM information_schema.tables
                      WHERE table_schema = '{schema_name}'
                      AND table_type = 'BASE TABLE'
                      AND table_name !='{catalog_table_upper}'""").collect()
    try:
        Parallel(n_jobs=multiprocessing.cpu_count(), backend="threading")(
                delayed(generate_descr)(
                    session,
                    table,
                    catalog_table,
                ) for table in tablenames
            )
        return 'Success'
    except Exception as e:
        # Catch and return the error message
        return f"An error occurred: {str(e)}"
$$;
Copy

Step 2: Create the catalog table to populate

Use the following code to create the catalog table where table and column descriptions are stored.

CREATE OR REPLACE TABLE catalog_table (
  domain VARCHAR,
  description VARCHAR,
  name VARCHAR,
  database_name VARCHAR,
  schema_name VARCHAR,
  table_name VARCHAR
  );
Copy

Step 3: Call the stored procedures

Assuming your schema is named sch1, call the stored procedure as follows:

CALL describe_tables_set_catalog('sch1', 'catalog_table');
Copy
Language: English