Using SQL to automatically generate object descriptions¶

The Cortex Powered Object Descriptions feature allows you to use the Snowflake Cortex COMPLETE function to automatically generate descriptions for tables, views, and columns. This feature leverages Snowflake-hosted large language models (LLMs) to evaluate object metadata and, if desired, sample data to generate a description.

This topic describes how to use a stored procedure to generate descriptions programmatically. For information about using Snowsight to generate the descriptions, see 使用 Snowflake Cortex 生成描述.

Generating a description¶

The AI_GENERATE_TABLE_DESC stored procedure automatically generates a description for a table and view. It can also generate descriptions for the columns of that table or view.

The AI_GENERATE_TABLE_DESC stored procedure accepts two arguments:

The name of the table or view that you want to generate a description for.
An optional configuration object that allows you to do the following:
- Generate descriptions for the columns of the specified table or view.
- Use sample data from the table or view to potentially improve the accuracy of the column descriptions.

Example: Generate a table description

CALL AI_GENERATE_TABLE_DESC( 'my_table');

Copy

Example: Generate table and column descriptions without using sample data

CALL AI_GENERATE_TABLE_DESC(
  'mydb.sch1.hr_data',
  {
    'describe_columns': true,
    'use_table_data': false
  });

Copy

Example: Generate view and column descriptions using sample data to improve accuracy

CALL AI_GENERATE_TABLE_DESC(
  'mydb.sch1.v1',
  {
    'describe_columns': true,
    'use_table_data': true
  });

Copy

For the complete syntax of the stored procedure, see AI_GENERATE_TABLE_DESC.

Working with the response¶

The AI_GENERATE_TABLE_DESC stored procedure returns a JSON object that contains the generated descriptions along with general information about the table and columns. Within this object, the description field contains the generated description.

Suppose you created the following table:

CREATE OR REPLACE TABLE mydb.sch1.hr_data (fname VARCHAR, age INTEGER);

INSERT INTO hr_data (fname, age)
    VALUES
        ('Thomas',    44),
        ('Katherine', 29),
        ('Lisa',      29);

Copy

Given this table, the following is an example of the JSON object returned by AI_GENERATE_TABLE_DESC:

{
  "COLUMNS": [
    {
      "database_name": "mydb",
      "description": "The first name of the employee.",
      "name": "FNAME",
      "schema_name": "sch1",
      "table_name": "hr_data"
    }
    {
      "database_name": "mydb",
      "description": "A column holding data of type DecimalType representing age values.",
      "name": "AGE",
      "schema_name": "sch1",
      "table_name": "hr_data"
    },
  ],
  "TABLE": [
    {
      "database_name": "mydb",
      "description": " The table contains records of employee data, specifically demographic information. Each record includes an employee's age and name.",
      "name": "hr_data",
      "schema_name": "sch1"
    }
  ]
}

For more information about each JSON field, see 返回.

Set generated descriptions as comments¶

To set a generated description as a comment on a table, view, or column, you must manually execute a SQL statement that includes the SET COMMENT parameter. For example, to save a generated description for a table t1, execute ALTER TABLE t1 SET COMMENT = 'ai generated description';.

You can write custom code to automatically generate and save descriptions. For examples of stored procedures that do this, see 示例.

访问控制要求¶

Users must have the following privileges and roles to call the AI_GENERATE_TABLE_DESC stored procedure:

SELECT privilege on the table or view.
SNOWFLAKE.CORTEX_USER database role.

Availability of the feature¶

Your region must support the LLM used by Snowflake Cortex (like Mistral-7b and Llama 3.1-8b) to generate the descriptions. Check the availability of the COMPLETE function. If the COMPLETE function is not supported in your region, you must enable cross-region inference to use the feature.

Using sample data¶

为列生成描述时，您只能依赖元数据，或者您可以选择使用示例数据来改进 Snowflake Cortex 支持的描述。示例数据是指在使用 Snowflake Cortex 生成描述时评估的特定列中的数据。如果您选择使用示例数据，Snowflake 会使用部分示例数据来生成描述，从而获得更准确的描述。Snowflake 不将示例数据存储为使用情况数据。

成本注意事项¶

生成描述会产生以下费用：

正在使用的仓库消耗的 credit。
因使用 Snowflake Cortex 与较小的 LLMs（例如 Mistral-7b 和 Llama 3.1-8b）而收取的 credit。这些费用作为 AI 服务出现在账单上，其中包括 Snowflake Cortex 的所有使用情况。

限制¶

You cannot generate column descriptions for objects with more than 5,000 columns.

法律声明¶

This feature relies on the COMPLETE function to generate a recommended object description. When the user initiates the description generation, Usage Data may be collected through the COMPLETE function.

The generated description is not retained by Snowflake until it is saved by the user.

有关使用 AI 的更多信息，请参阅 Snowflake AI 和 ML。

示例¶

Example: Generate descriptions and set them as comments¶

Step 1: Create a stored procedure

The following stored procedure does the following:

Automatically generates descriptions for all tables (and their columns) in a schema.
Sets these descriptions as comments on the tables and columns.

CREATE OR REPLACE PROCEDURE DESCRIBE_TABLES_SET_COMMENT (database_name STRING, schema_name STRING,
  set_table_comment BOOLEAN,
  set_column_comment BOOLEAN)
  RETURNS STRING
  LANGUAGE PYTHON
  RUNTIME_VERSION = '3.10'
  PACKAGES=('snowflake-snowpark-python','joblib')
  HANDLER = 'main'
AS
$$
import json
from joblib import Parallel, delayed
import multiprocessing

def generate_descr(session, database_name, schema_name, table, set_table_comment, set_column_comment):
  table_name =  table['TABLE_NAME']
  async_job = session.sql(f"CALL AI_GENERATE_TABLE_DESC( '{database_name}.{schema_name}.{table_name}',{{'describe_columns': true, 'use_table_data': true}})").collect_nowait()
  result = async_job.result()
  output = json.loads(result[0][0])
  columns_ret = output["COLUMNS"]
  table_ret = output["TABLE"][0]

  table_description = table_ret["description"]
  table_name = table_ret["name"]
  database_name = table_ret["database_name"]
  schema_name = table_ret["schema_name"]

  if (set_table_comment):
      table_description = table_description.replace("'", "\\'")
      session.sql(f"""ALTER TABLE {database_name}.{schema_name}.{table_name} SET COMMENT = '{table_description}'""").collect()

  for column in columns_ret:
      column_description = column["description"];
      column_name = column["name"];
      if not column_name.isupper():
        column_name = '"' + column_name + '"'

      if (set_column_comment):
          column_description = column_description.replace("'", "\\'")
          session.sql(f"""ALTER TABLE  {database_name}.{schema_name}.{table_name} MODIFY COLUMN {column_name}  COMMENT '{column_description}'""").collect()

  return 'Success';

def main(session, database_name, schema_name, set_table_comment, set_column_comment):

    schema_name = schema_name.upper()
    database_name = database_name.upper()
    tablenames = session.sql(f"""SELECT table_name
                      FROM {database_name}.information_schema.tables
                      WHERE table_schema = '{schema_name}'
                      AND table_type = 'BASE TABLE'""").collect()
    try:
        Parallel(n_jobs=multiprocessing.cpu_count(), backend="threading")(
                delayed(generate_descr)(
                    session,
                    database_name,
                    schema_name,
                    table,
                    set_table_comment,
                    set_column_comment,
                ) for table in tablenames
            )
        return 'Success'
    except Exception as e:
        # Catch and return the error message
        return f"An error occurred: {str(e)}"
$$;

Copy

Step 2: Call the stored procedure

Assuming your schema is named my_db.sch1, call the stored procedure as follows to generate descriptions for both tables and columns:

CALL describe_tables_set_comment('my_db', 'sch1', true, true);

Copy

You can run a DESC TABLE command to verify that the generated descriptions were set as comments on a table.

Example: Generate descriptions and save them to a catalog table¶

Step 1: Create a stored procedure

The following stored procedure does the following:

Automatically generates descriptions for all tables (and their columns) in a schema.
Populates a catalog table, where each row represents a table or column with its generated description.

CREATE OR REPLACE PROCEDURE DESCRIBE_TABLES_SET_CATALOG (database_name string, schema_name string, catalog_table string)
  RETURNS STRING
  LANGUAGE PYTHON
  RUNTIME_VERSION = '3.10'
  PACKAGES=('snowflake-snowpark-python','joblib')
  HANDLER = 'main'
AS
$$
import json
from joblib import Parallel, delayed
import multiprocessing

def generate_descr(session, database_name, schema_name, table, catalog_table):
    table_name =  table['TABLE_NAME']
    async_job = session.sql(f"CALL AI_GENERATE_TABLE_DESC( '{database_name}.{schema_name}.{table_name}',{{'describe_columns': true, 'use_table_data': true}})").collect_nowait()
    result = async_job.result()
    output = json.loads(result[0][0])
    columns_ret = output["COLUMNS"]
    table_ret = output["TABLE"][0]

    table_description = table_ret["description"]
    table_description = table_description.replace("'", "\\'")
    table_name = table_ret["name"]
    database_name = table_ret["database_name"]
    schema_name = table_ret["schema_name"]

    session.sql(f"""INSERT INTO {catalog_table} (domain, description, name, database_name, schema_name, table_name)
                          VALUES ('TABLE', '{table_description}', '{table_name}', '{database_name}', '{schema_name}', null)""").collect()

    for column in columns_ret:
        column_description = column["description"];
        column_description = column_description.replace("'", "\\'")
        column_name = column["name"];
        if not column_name.isupper():
            column_name = '"' + column_name + '"'
        session.sql(f"""INSERT INTO {catalog_table} (domain, description, name, database_name, schema_name, table_name)
                          VALUES ('COLUMN', '{column_description}', '{column_name}', '{database_name}', '{schema_name}', '{table_name}')""").collect()

    return 'Success';

def main(session, database_name, schema_name, catalog_table):

    schema_name = schema_name.upper()
    database_name = database_name.upper()
    catalog_table_upper = catalog_table.upper()
    tablenames = session.sql(f"""SELECT table_name
                      FROM {database_name}.information_schema.tables
                      WHERE table_schema = '{schema_name}'
                      AND table_type = 'BASE TABLE'
                      AND table_name !='{catalog_table_upper}'""").collect()
    try:
        Parallel(n_jobs=multiprocessing.cpu_count(), backend="threading")(
                delayed(generate_descr)(
                    session,
                    database_name,
                    schema_name,
                    table,
                    catalog_table,
                ) for table in tablenames
            )
        return 'Success'
    except Exception as e:
        # Catch and return the error message
        return f"An error occurred: {str(e)}"
$$;

Copy

Step 2: Create the catalog table to populate

Use the following code to create the catalog table where table and column descriptions are stored.

CREATE OR REPLACE TABLE catalog_table (
  domain VARCHAR,
  description VARCHAR,
  name VARCHAR,
  database_name VARCHAR,
  schema_name VARCHAR,
  table_name VARCHAR
  );

Copy

Step 3: Call the stored procedures

Assuming your schema is named my_db.sch1, call the stored procedure as follows:

CALL describe_tables_set_catalog('my_db', 'sch1', 'catalog_table');

Copy