微调 (Snowflake Cortex)¶
The Snowflake Cortex Fine-tuning function offers a way to customize large language models for your specific task. This topic describes how the feature works and how to get started with creating your own fine-tuned model.
概述
Cortex Fine-tuning allows users to leverage parameter-efficient fine-tuning (PEFT) to create customized adaptors for use with pre-trained models on more specialized tasks. If you don’t want the high cost of training a large model from scratch but need better latency and results than you’re getting from prompt engineering or even retrieval augmented generation (RAG) methods, fine-tuning an existing large model is an option. Fine-tuning allows you to use examples to adjust the behavior of the model and improve the model’s knowledge of domain-specific tasks.
Cortex Fine-tuning is a fully managed service that lets you fine-tune popular LLMs using your data, all within Snowflake.
Cortex Fine-tuning features are provided as a Snowflake Cortex function, FINETUNE, with the following arguments:
- CREATE: Creates a fine-tuning job with the given training data.
- SHOW: Lists all the fine-tuning jobs in the current account.
- DESCRIBE: Describes the progress and status of a particular fine-tuning job.
- CANCEL: Cancels a given fine-tuning job.
成本注意事项
The Snowflake Cortex Fine-tuning function incurs compute cost based on the number of tokens used in training. In addition, running the AI_COMPLETE function on a fine-tuned model incurs compute costs based on the number of tokens processed. Refer to the Snowflake Service Consumption Table for each cost in credits per million tokens.
A token is the smallest unit of text processed by the Snowflake Cortex Fine-tuning function, approximately equal to four characters of text. The equivalence of raw input or output text to tokens can vary by model.
-
对于在响应中生成新文本的 COMPLETE 函数,输入和输出词元都会统计在内。
-
微调训练的词元的计算方式如下:
Use the FINETUNE (‘DESCRIBE’) (SNOWFLAKE.CORTEX) to see the number of trained tokens for your fine-tuning job.
-
In addition to tuning and inference charges, standard storage and warehouse costs apply for storing the customized adaptors and running SQL commands.
跟踪 credit 使用量以进行微调训练¶
To view the credit and token consumption for fine-tuning training jobs, use the CORTEX_FINE_TUNING_USAGE_HISTORY view:
其他注意事项
-
微调作业通常运行时间较长,且不附加至工作表会话。
-
训练/验证数据集中的行数受到基础模型和训练轮次数量的限制。下表显示了 3 个轮次的各项限制:
| | 模型 | 1 个轮次 | 3 个轮次(默认) | | -------------- | ------- | ------------------ | | |
llama3-8b| 186k | 62k | | |llama3-70b| 21k | 7k | | |llama3.1-8b| 150k | 50k | | |llama3.1-70b| 13.5k | 4.5k | | |mistral-7b| 45k | 15k | | |mixtral-8x7b| 27k | 9k |
访问控制要求
要运行微调作业,创建微调作业的角色需要具备以下权限:
| 权限 | 对象 | 备注 |
|---|---|---|
| USAGE | DATABASE | 从其中查询训练(和验证)数据的数据库。 |
| CREATE MODEL 或 OWNERSHIP | SCHEMA | 模型保存到的架构。 |
The following SQL is an example of granting the CREATE MODEL privilege to a role, my_role, on my_schema.
Additionally, to use the FINETUNE function, the ACCOUNTADMIN role must grant the SNOWFLAKE.CORTEX_USER database role to the user who will call the function. See LLM Functions required privileges topic for details.
To give other roles access to use the fine-tuned model, you must grant usage on the model. For details, see Model privileges.
可进行微调的模型
以下基础模型可以进行微调。未来可能会增加或移除可进行微调的模型:
| 名称 | 描述 |
|---|---|
llama3-8b | A large language model from Meta that is ideal for tasks that require low to moderate reasoning like text classification, summarization, and sentiment analysis. |
llama3-70b | 来自 Meta 的 LLM 提供先进的性能,非常适合聊天应用程序、内容创建和企业应用程序。 |
llama3.1-8b | 来自 Meta 的大型语言模型非常适合需要低到中等推理量的任务。它是一个轻量级、超快速模型,具有 24000 的上下文窗口。 |
llama3.1-70b | 这是一种开源模型,具有先进的性能,非常适合聊天应用程序、内容创建和企业应用程序。它是一种高性能、高性价比的模型,可实现各种用例。 |
mistral-7b | 来自 Mistral AI 的 70 亿参数大型语言模型非常适合需要快速完成的最简单总结、结构化处理和问答任务。它通过其 32000 上下文窗口为多页文本提供低延迟和高吞吐量处理。 |
mixtral-8x7b | 来自 Mistral AI 的大型语言模型非常适合文本生成、分类和问答用途。Mistral 模型针对低延迟和低内存要求进行了优化,从而能为企业用例带来更高吞吐量。 |
如何微调模型
调整模型的整体工作流程如下:
- Prepare the training data.
- Start the fine-tuning job with the required parameters.
- Monitor training job.
Once training is complete, you can use the model name provided by Cortex Fine-tuning to run inference on your model.
准备微调数据
The fine-tuning data must come from a Snowflake table or view and the query result must contain columns named prompt and completion.
If your table or view does not contain columns with the required names, use a column alias in your query to name them. This query is given
as a parameter to the FINETUNE function. You will get an error if the results do not contain prompt and completion column names.
Note
除了提示和完成列之外,FINETUNE 函数将忽略所有其他列。Snowflake 建议使用仅选择所需列的查询。
The following code calls the FINETUNE function and uses the SELECT ... AS syntax to set two of the columns in the query result
to prompt and completion.
Note
To get responses that follow a schema you define, use structured outputs to generate fine-tuning data. For more information about structured outputs, see AI_COMPLETE structured outputs.
提示是对 LLM 的输入,而完成是来自 LLM 的响应。训练数据应包括提示和完成对,以展示您希望模型如何响应特定提示。
以下是有关训练数据的其他建议和要求,用于获得最佳微调性能。
-
Start with a few hundred examples. Starting with too many examples may increase tuning time drastically with minimal improvement in performance.
-
For each example, you must use only a portion of the allotted context window for the base model you are tuning. Context window is defined in terms of tokens. A token is the smallest unit of text processed by Snowflake Cortex functions, approximately equal to four characters of text. Prompt and completion pairs that exceed this limit will be truncated, which may negatively impact the quality of the trained model.
-
The portion of the context window allotted for
promptandcompletionfor each base model is defined in the following table:| | 模型 | 上下文窗口 | 输入上下文(提示) | 输出上下文(完成) | | ------------ | -------------- | ---------------------- | --------------------------- | | | llama3-8b | 8000 | 6000 | 2000 | | | llama3-70b | 8000 | 6000 | 2000 | | | llama3.1-8b | 24000 | 20000 | 4000 | | | llama3.1-70b | 8000 | 6000 | 2000 | | | mistral-7b | 32000 | 28000 | 4000 | | | mixtral-8x7b | 32000 | 28000 | 4000 |
开始微调作业
You can start a fine-tuning job by calling the SNOWFLAKE.CORTEX.FINETUNE function and passing in ‘CREATE’ as the first argument or using Snowsight.
使用 SQL¶
This example uses the mistral-7b model as the base model to create a job with a model output name of my_tuned_model and training
and validation data querying from the my_training_data and my_validation_data tables respectively.
You can use absolute paths for each of the database objects such as the model or data if you want to use different database and schema for each. The following example shows creating a fine-tuning job with data from mydb2.myschema2 database and schema and saving the fine-tuned model to the mydb.myschema database and schema.
The SNOWFLAKE.CORTEX.FINETUNE function with ‘CREATE’ as the first argument returns a fine-tuned model ID as the output. Use this ID to get status or job progress using the SNOWFLAKE.CORTEX.FINETUNE function with ‘DESCRIBE’ as the first argument.
Use Snowsight¶
Follow these steps to create a fine-tuning job in the Snowsight:
- Sign in to Snowsight.
- 选择被授予 SNOWFLAKE.CORTEX_USER 数据库角色的角色。
- In the navigation menu, select AI & ML » AI Studio.
- Select Fine-tune from the Create Custom LLM box.
- 使用下拉菜单选择基础模型。
- Select the role under which the fine-tuning job will execute and the warehouse where it will run. The role must be granted the SNOWFLAKE.CORTEX_USER database role.
- 选择一个数据库来存储微调后的模型。
- Enter a name for your fine-tuned model, then select Let’s go.
- Select the table or view that contains your training data, then select Next. The training data can come from any database or schema that the role has access to.
- Select the column that contains the prompts in your training data, then select Next.
- Select the column that contains the completions in your training data, then select Next.
- If you have a validation dataset, select the table or view that contains your validation data, then select Next. If you don’t have separate validation data, select Skip this option.
- Verify your choices, then select Start training.
The final step confirms that your fine-tuning job has started and displays the Job ID. Use this ID to get status or job progress using the SNOWFLAKE.CORTEX.FINETUNE function with ‘DESCRIBE’ as the first argument.
管理微调作业
Fine-tuning jobs are long running, which means they are not tied to a worksheet session. You can check the status of your tuning job using the SNOWFLAKE.CORTEX.FINETUNE function with SHOW or ‘DESCRIBE’ as the first argument.
If you no longer need a fine-tuning job, you can terminate the job using the SNOWFLAKE.CORTEX.FINETUNE function with CANCEL as the first argument and the job ID as the second argument.
分析微调模型
After a fine-tuning job completes, you can analyze the results of the training process by examining the fine-tuned model’s artifacts. The OWNERSHIP privilege on the model is required to access the fine-tuned model’s artifacts; for details, see Model privileges.
The artifacts include a training_results.csv file. This CSV file
contains one header row followed by a row for each training step recorded by the
fine-tuning job. The file contains the following columns:
列名称 描述 步骤 在整个训练过程中完成的训练步骤数。从 1 开始。 轮次 训练过程中的轮次。从 1 开始。 training_loss 训练批次的损失。数值越小,表示模型与数据之间的拟合越接近。 validation_loss 验证数据集上的损失。这仅在每个轮次的最后一步可用。
The training_results.csv file can be found in the Model Registry UI in Snowsight
and accessed directly via SQL or Python API.
For more information, see
Working with model artifacts.
使用微调模型进行推理
Use the COMPLETE LLM function with the name your fine-tuned model to make inferences.
This example shows a call to the COMPLETE function with the name of your fine-tuned model.
以下是示例调用的输出片段:
限制和已知问题
- 微调作业仅可在账户级别列出。
- The fine-tuning jobs returned from FINETUNE (‘SHOW’) (SNOWFLAKE.CORTEX) are not permanent and may be garbage collected periodically.
- 如果从 Cortex LLM 函数中移除基础模型,微调模型将不再运作。
共享模型
Fine-tuned models can be shared to other accounts with the USAGE privilege via Data Sharing.
复制模型
Cross-region inference does not support fine-tuned models. Inference must take place in the same region where the model object is located. You can use database replication to replicate the fine-tuned model object to a region you want to make inference from if it’s different than the region the model was trained in.
For example,
if you create a fine-tuned model based on mistral-7b in your account in the AWS US West 2 region, you can use data sharing to share it
with another account in this region, or you can use database replication to replicate the model to another account in your organization
in a different region that supports the mistral-7b model, such as AWS Europe West. For details on replicating objects, see
Replicating databases and account objects across multiple accounts.
法律声明
The data classification of inputs and outputs are as set forth in the following table.
| Input data classification | Output data classification |
|---|---|
| Usage Data | Customer Data |
For additional information, refer to Snowflake AI and ML.