Hugging Face pipeline¶
The Snowflake Model Registry supports any Hugging Face model defined as a transformer (link removed) that can be loaded with the transformers.Pipeline (link removed) method.
使用以下方法之一将 Hugging Face 模型记录到 Model Registry:
使用 Snowsight 从 Hugging Face 导入和部署模型。有关说明,请参阅 Import and deploy models from an external service。
创建
snowflake.ml.model.models.huggingface.TransformersPipeline实例并调用log_model():# reg: snowflake.ml.registry.Registry from snowflake.ml.model.models import huggingface model = huggingface.TransformersPipeline( task="text-classification", model="ProsusAI/finbert", # compute_pool_for_log=... # Optional ) mv = reg.log_model(model, model_name='finbert', version_name='v5')
重要
如果未指定
compute_pool_for_log实参,则使用默认的 CPU 计算池记录模型。如果指定了
compute_pool_for_log实参,则使用指定的计算池记录模型。如果将
compute_pool_for_log实参指定为 None,则模型文件会先下载到本地,然后上传到 Model Registry。这需要安装 huggingface-hub (https://pypi.org/project/huggingface-hub/)。
从内存中的 Hugging Face 加载模型并将其记录到 Model Registry:
# reg: snowflake.ml.registry.Registry lm_hf_model = transformers.pipeline( task="text-generation", model="bigscience/bloom-560m", token="...", # Put your HuggingFace token here. return_full_text=False, max_new_tokens=100, ) lmv = reg.log_model(lm_hf_model, model_name='bloom', version_name='v560m')
如果您使用的是 Snowflake 笔记本,则为了下载模型的权重,您需要将外部访问集成附加到您的笔记本。要允许访问以下主机,需要此集成:
huggingface.cohub-ci.huggingface.cocdn-lfs-us-1.hf.cocdn-lfs-eu-1.hf.cocdn-lfs.hf.cotransfer.xethub.hf.cocas-server.xethub.hf.cocas-bridge.xethub.hf.c
备注
此主机列表仅包含访问 Hugging Face 所需的主机,且可能随时变更。您的模型可能需要来自其他来源的工件,应将这些工件添加到允许访问的网络规则中。
以下示例创建了一个新的外部访问集成 huggingface_network_rule,用于与笔记本一起使用:
CREATE NETWORK RULE huggingface_network_rule
TYPE = HOST_PORT
VALUE_LIST = (
'huggingface.co',
'hub-ci.huggingface.co',
'cdn-lfs-us-1.hf.co',
'cdn-lfs-eu-1.hf.co',
'cdn-lfs.hf.co',
'transfer.xethub.hf.co',
'cas-server.xethub.hf.co',
'cas-bridge.xethub.hf.co'
)
MODE = EGRESS
COMMENT = 'Network Rule for Hugging Face external access';
CREATE EXTERNAL ACCESS INTEGRATION huggingface_access_integration
ALLOWED_NETWORK_RULES = (huggingface_network_rule)
ENABLED = true;
有关更多信息,请参阅 创建和使用外部访问集成。
创建外部访问集成后,将其附加到您的笔记本,即可有权访问 Hugging Face 模型存储库,以下载模型的权重和配置。有关更多信息,请参阅 为 Snowflake Notebooks 设置外部访问。
模型注册表 API¶
调用 log_model() 时,options 字典支持以下键:
Option key |
描述 |
类型 |
|---|---|---|
|
A list of methods available on the model object. Hugging Face models use the object's |
|
|
The version of the CUDA runtime to be used when deploying to a platform with a GPU. If set to |
|
The model registry infers the signatures argument if the pipeline contains a task from the following list:
text-generation)
translation_xx_to_yy,其中
xx和yy是 ISO 3166-1 alpha-2 (link removed) 中定义的双字母国家/地区代码
备注
任务名称区分大小写。
The sample_input_data argument to log_model is ignored for Hugging Face models. Specify the signatures argument
when logging a Hugging Face model that is not in the preceding list so that the registry knows the signatures of the target
methods.
要查看推断的签名,请调用 show_functions() 方法。此签名为您提供模型函数输入所需的类型和列名称及其输出格式。以下示例显示具有 text-generation 任务的模型 bigscience/bloom-560m 的签名:
{'name': '__CALL__',
'target_method': '__call__',
'signature': ModelSignature(
inputs=[
FeatureSpec(dtype=DataType.STRING, name='inputs')
],
outputs=[
FeatureSpec(dtype=DataType.STRING, name='outputs')
]
)}]
以下示例演示如何使用前面的签名调用模型:
# model: snowflake.ml.model.ModelVersion
import pandas as pd
remote_prediction = model.run(pd.DataFrame(["Hello, how are you?"], columns=["inputs"]))
使用说明¶
Many Hugging Face models are large and don't fit in a standard warehouse. Use a Snowpark-optimized warehouse or choose a smaller version of the model. For example, an alternative to the
Llama-2-70b-chat-hfmodel isLlama-2-7b-chat-hf.Snowflake 仓库没有 GPUs。仅使用 CPU 优化型 Hugging Face 模型。
Some Hugging Face transformers return an array of dictionaries per input row. The model registry converts this array of dictionaries to a string containing a JSON representation of the array. For example, multi-output Question Answering output looks similar to this:
'[{"score": 0.61094731092453, "start": 139, "end": 178, "answer": "learn more about the world of athletics"}, {"score": 0.17750297486782074, "start": 139, "end": 180, "answer": "learn more about the world of athletics.\""}]'
示例¶
# Prepare model
import transformers
import pandas as pd
finbert_model = transformers.pipeline(
task="text-classification",
model="ProsusAI/finbert",
top_k=2,
)
# Log the model
mv = registry.log_model(
finbert_model,
model_name="finbert",
version_name="v1",
)
# Use the model
mv.run(pd.DataFrame(
[
["I have a problem with my Snowflake that needs to be resolved asap!!", ""],
["I would like to have udon for today's dinner.", ""],
]
)
)
结果:
0 [{"label": "negative", "score": 0.8106237053871155}, {"label": "neutral", "score": 0.16587384045124054}]
1 [{"label": "neutral", "score": 0.9263970851898193}, {"label": "positive", "score": 0.05286872014403343}]
Hugging Face 管道的推断签名¶
This section describes the inferred signatures for supported Hugging Face pipelines, including a description and example of the required inputs and expected outputs. All inputs and outputs are Snowpark DataFrames.
Fill-mask 管道¶
“Fill-mask (link removed)”任务的管道具有以下输入和输出。
输入¶
inputs:要填充掩码的字符串。
示例:
--------------------------------------------------
|"inputs" |
--------------------------------------------------
|LynYuu is the [MASK] of the Grand Duchy of Yu. |
--------------------------------------------------
输出¶
outputs:一个字符串,包含以 JSON 格式表示的对象列表,列表中的每个对象都可能包含score、token、token_str或sequence等键。 有关详细信息,请参阅 FillMaskPipeline (link removed)。
示例:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs" |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"score": 0.9066258072853088, "token": 3007, "token_str": "capital", "sequence": "lynyuu is the capital of the grand duchy of yu."}, {"score": 0.08162177354097366, "token": 2835, "token_str": "seat", "sequence": "lynyuu is the seat of the grand duchy of yu."}, {"score": 0.0012052370002493262, "token": 4075, "token_str": "headquarters", "sequence": "lynyuu is the headquarters of the grand duchy of yu."}, {"score": 0.0006560495239682496, "token": 2171, "token_str": "name", "sequence": "lynyuu is the name of the grand duchy of yu."}, {"score": 0.0005427763098850846, "token": 3200, "token_str"... |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Code Example¶
import transformers
import pandas as pd
model = transformers.pipeline(
task="fill-mask",
model="google-bert/bert-base-uncased",
)
mv = registry.log_model(
model=model,
model_name="GOOGLE_BERT_BASE_UNCASED",
)
input_df = pd.DataFrame([{"text": "LynYuu is the [MASK] of the Grand Duchy of Yu."}])
mv.run(
input_df,
# function_name="__call__", # Optional
)
令牌分类¶
A pipeline whose task is "ner" or token-classification (link removed) has the following inputs and outputs.
输入¶
inputs:包含要分类的令牌的字符串。
示例:
------------------------------------------------
|"inputs" |
------------------------------------------------
|My name is Izumi and I live in Tokyo, Japan. |
------------------------------------------------
输出¶
outputs:一个字符串,包含以 JSON 格式表示的结果对象的列表,列表中的每个对象都可能包含entity、score、index、word、name、start或end等键。 有关详细信息,请参阅 TokenClassificationPipeline <https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TokenClassificationPipeline (link removed)>`_。
示例:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs" |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"entity": "PRON", "score": 0.9994392991065979, "index": 1, "word": "my", "start": 0, "end": 2}, {"entity": "NOUN", "score": 0.9968984127044678, "index": 2, "word": "name", "start": 3, "end": 7}, {"entity": "AUX", "score": 0.9937735199928284, "index": 3, "word": "is", "start": 8, "end": 10}, {"entity": "PROPN", "score": 0.9928083419799805, "index": 4, "word": "i", "start": 11, "end": 12}, {"entity": "PROPN", "score": 0.997334361076355, "index": 5, "word": "##zumi", "start": 12, "end": 16}, {"entity": "CCONJ", "score": 0.999173104763031, "index": 6, "word": "and", "start": 17, "end": 20}, {... |
Code Example¶
import transformers
import pandas as pd
model = transformers.pipeline(
task="token-classification",
model="dslim/bert-base-NER",
)
mv = registry.log_model(
model=model,
model_name="BERT_BASE_NER",
)
mv.run(
pd.DataFrame([{"inputs": "My name is Izumi and I live in Tokyo, Japan."}]),
# function_name="__call__", # Optional
)
问答(单个输出)¶
“` question-answering <https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.QuestionAnsweringPipeline (link removed)>`_”任务的管道,其中 top_k 未设置或设置为 1,具有以下输入和输出。
输入¶
question:包含要回答的问题的字符串。context:可能包含答案的字符串。
示例:
-----------------------------------------------------------------------------------
|"question" |"context" |
-----------------------------------------------------------------------------------
|What did Doris want to do? |Doris is a cheerful mermaid from the ocean dept... |
-----------------------------------------------------------------------------------
输出¶
score:浮点置信度分数从 0.0 到 1.0。start:在上下文中,答案第一个词元的整数索引。end:在原始上下文中,答案最后一个词元的整数索引。answer:包含找到的答案的字符串。
示例:
--------------------------------------------------------------------------------
|"score" |"start" |"end" |"answer" |
--------------------------------------------------------------------------------
|0.61094731092453 |139 |178 |learn more about the world of athletics |
--------------------------------------------------------------------------------
Code Example¶
import transformers
import pandas as pd
model = transformers.pipeline(
task="question-answering",
model="deepset/roberta-base-squad2",
)
QA_input = {
"question": "Why is model conversion important?",
"context": "The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.",
}
mv = registry.log_model(
model=model,
model_name="ROBERTA_BASE_SQUAD2",
)
mv.run(
pd.DataFrame.from_records([QA_input]),
# function_name="__call__", # Optional
)
问答(多个输出)¶
任务是“` question-answering <https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.QuestionAnsweringPipeline (link removed)>`_”的管道,其中 top_k 设置为大于 1,具有以下输入和输出。
输入¶
question:包含要回答的问题的字符串。context:可能包含答案的字符串。
示例:
-----------------------------------------------------------------------------------
|"question" |"context" |
-----------------------------------------------------------------------------------
|What did Doris want to do? |Doris is a cheerful mermaid from the ocean dept... |
-----------------------------------------------------------------------------------
输出¶
outputs:一个字符串,包含以 JSON 格式表示的结果对象的列表,列表中的每个对象都可能包含score、start、end或answer等键。
示例:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs" |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"score": 0.61094731092453, "start": 139, "end": 178, "answer": "learn more about the world of athletics"}, {"score": 0.17750297486782074, "start": 139, "end": 180, "answer": "learn more about the world of athletics.\""}, {"score": 0.06438097357749939, "start": 138, "end": 178, "answer": "\"learn more about the world of athletics"}] |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Code Example¶
import transformers
import pandas as pd
model = transformers.pipeline(
task="question-answering",
model="deepset/roberta-base-squad2",
top_k=3,
)
QA_input = {
"question": "Why is model conversion important?",
"context": "The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.",
}
mv = registry.log_model(
model=model,
model_name="ROBERTA_BASE_SQUAD2",
)
mv.run(
pd.DataFrame.from_records([QA_input]),
# function_name="__call__", # Optional
)
摘要¶
任务是“` summarization <https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.SummarizationPipeline (link removed)>`_”的管道,其中 return_tensors 为 False 或未设置,具有以下输入和输出。
输入¶
documents:包含要汇总的文本的字符串。
示例:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"documents" |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Neuro-sama is a chatbot styled after a female VTuber that hosts live streams on the Twitch channel "vedal987". Her speech and personality are generated by an artificial intelligence (AI) system wh... |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
输出¶
summary_text:包含生成的摘要的字符串,或者,如果num_return_sequences大于 1,则字符串包含以 JSON 格式表示的结果列表,每个结果都是一个包含字段的字典,其中包括summary_text。
示例:
---------------------------------------------------------------------------------
|"summary_text" |
---------------------------------------------------------------------------------
| Neuro-sama is a chatbot styled after a female VTuber that hosts live streams |
---------------------------------------------------------------------------------
Code Example¶
import transformers
import pandas as pd
model = transformers.pipeline(
task="summarization",
model="facebook/bart-large-cnn",
)
text = "The transformers library is a great library for natural language processing which provides a unified interface for many different models and tasks."
mv = registry.log_model(
model=model,
model_name="BART_LARGE_CNN",
)
mv.run(
pd.DataFrame.from_records([{"documents": text}]),
# function_name="__call__", # Optional
)
表问答¶
任务是“` table-question-answering <https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TableQuestionAnsweringPipeline (link removed)>`_”的管道具有以下输入和输出。
输入¶
query:包含要回答的问题的字符串。table:包含 JSON 序列化字典的字符串,形式为{column -> [values]},表示可能包含答案的表。
示例:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"query" |"table" |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Which channel has the most subscribers? |{"Channel": ["A.I.Channel", "Kaguya Luna", "Mirai Akari", "Siro"], "Subscribers": ["3,020,000", "872,000", "694,000", "660,000"], "Videos": ["1,200", "113", "639", "1,300"], "Created At": ["Jun 30 2016", "Dec 4 2017", "Feb 28 2014", "Jun 23 2017"]} |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
输出¶
answer:包含可能答案的字符串。coordinates:表示答案所在的单元格坐标的整数列表。cells:包含答案所在的单元格内容的字符串列表。aggregator:包含所用聚合器名称的字符串。
示例:
----------------------------------------------------------------
|"answer" |"coordinates" |"cells" |"aggregator" |
----------------------------------------------------------------
|A.I.Channel |[ |[ |NONE |
| | [ | "A.I.Channel" | |
| | 0, |] | |
| | 0 | | |
| | ] | | |
| |] | | |
----------------------------------------------------------------
Code Example¶
import transformers
import pandas as pd
import json
model = transformers.pipeline(
task="table-question-answering",
model="microsoft/tapex-base-finetuned-wikisql",
)
data = {
"year": [1896, 1900, 1904, 2004, 2008, 2012],
"city": ["athens", "paris", "st. louis", "athens", "beijing", "london"],
}
query = "What is the city of the year 2004?"
mv = registry.log_model(
model=model,
model_name="TAPEX_BASE_FINETUNED_WIKISQL",
)
mv.run(
pd.DataFrame.from_records([{"query": query, "table": json.dumps(data)}]),
# function_name="__call__", # Optional
)
文本分类(单个输出)¶
“text-clasification (link removed)”任务的管道,其中 top_k 未设置或为 None,具有以下输入和输出。
输入¶
text:要分类的字符串。text_pair:与text一起分类的字符串,用于计算文本相似度的模型。如果模型不使用它,则留空。
示例:
----------------------------------
|"text" |"text_pair" |
----------------------------------
|I like you. |I love you, too. |
----------------------------------
输出¶
label:表示文本分类标签的字符串。score:浮点置信度分数从 0.0 到 1.0。
示例:
--------------------------------
|"label" |"score" |
--------------------------------
|LABEL_0 |0.9760091304779053 |
--------------------------------
Code Example¶
import transformers
import pandas as pd
model = transformers.pipeline(
task="text-classification",
model="cardiffnlp/twitter-roberta-base-sentiment-latest",
)
text = "I'm happy today!"
mv = registry.log_model(
model=model,
model_name="TWITTER_ROBERTA_BASE_SENTIMENT_LATEST",
)
mv.run(
pd.DataFrame.from_records([{"text": text}]),
# function_name="__call__", # Optional
)
文本分类(多个输出)¶
“text-clasification (link removed)”任务的管道,其中 top_k 设置为一个数字,具有以下输入和输出。
备注
A text classification task is considered multiple-output if top_k is set to any number, even if that number is 1.
To get a single output, use a top_k value of None.
输入¶
text:要分类的字符串。text_pair:与text一起分类的字符串,用于计算文本相似度的模型。如果模型不使用它,则留空。
示例:
--------------------------------------------------------------------
|"text" |"text_pair" |
--------------------------------------------------------------------
|I am wondering if I should have udon or rice fo... | |
--------------------------------------------------------------------
输出¶
outputs:一个字符串,包含以 JSON 格式表示的结果列表,每个结果都包含包括label和score的字段。
示例:
--------------------------------------------------------
|"outputs" |
--------------------------------------------------------
|[{"label": "NEGATIVE", "score": 0.9987024068832397}] |
--------------------------------------------------------
Code Example¶
import transformers
import pandas as pd
model = transformers.pipeline(
task="text-classification",
model="cardiffnlp/twitter-roberta-base-sentiment-latest",
top_k=3,
)
text = "I'm happy today!"
mv = registry.log_model(
model=model,
model_name="TWITTER_ROBERTA_BASE_SENTIMENT_LATEST",
)
mv.run(
pd.DataFrame.from_records([{"text": text}]),
# function_name="__call__", # Optional
)
文本到文本的生成¶
任务是“` text2text-generation <https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.Text2TextGenerationPipeline (link removed)>`_”的管道,其中 return_tensors 为 False 或未设置,具有以下输入和输出。
输入¶
inputs:包含提示的字符串。
示例:
--------------------------------------------------------------------------------
|"inputs" |
--------------------------------------------------------------------------------
|A descendant of the Lost City of Atlantis, who swam to Earth while saying, " |
--------------------------------------------------------------------------------
输出¶
generated_text:如果
num_return_sequences为 1,则为包含生成文本的字符串;如果 num_return_sequences 大于 1,则为以 JSON 格式表示字典结果列表的字符串,字典包含generated_text在内的字段。
示例:
----------------------------------------------------------------
|"generated_text" |
----------------------------------------------------------------
|, said that he was a descendant of the Lost City of Atlantis |
----------------------------------------------------------------
Code Example¶
import transformers
import pandas as pd
model = transformers.pipeline(
task="text2text-generation",
model="google-t5/t5-small",
)
text = "Tell me a joke."
mv = registry.log_model(
model=model,
model_name="T5_SMALL",
)
mv.run(
pd.DataFrame.from_records([{"inputs": text}]),
# function_name="__call__", # Optional
)
备注
文本到文本的生成管道,其中 return_tensors 是 True,不受支持。
翻译生成¶
任务是“` translation <https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TranslationPipeline (link removed)>`_”的管道,其中 return_tensors 为 False 或未设置,具有以下输入和输出。
备注
翻译生成管道,其中 return_tensors 是 True,不受支持。
输入¶
inputs:包含要翻译的文本的字符串。
示例:
------------------------------------------------------------------------------------------------------
|"inputs" |
------------------------------------------------------------------------------------------------------
|Snowflake's Data Cloud is powered by an advanced data platform provided as a self-managed service. |
------------------------------------------------------------------------------------------------------
输出¶
translation_text:如果num_return_sequences为 1,则为表示生成的翻译的字符串,或者是以 JSON 格式表示字典结果列表的字符串,每个字典均包含包括translation_text的字段。
示例:
---------------------------------------------------------------------------------------------------------------------------------
|"translation_text" |
---------------------------------------------------------------------------------------------------------------------------------
|Le Cloud de données de Snowflake est alimenté par une plate-forme de données avancée fournie sous forme de service autogérés. |
---------------------------------------------------------------------------------------------------------------------------------
Code Example¶
import transformers
import pandas as pd
model = transformers.pipeline(
task="translation",
model="deepvk/kazRush-kk-ru",
)
text = "Иттерді кім шығарды?"
mv = registry.log_model(
model=model,
model_name="KAZRUSH_KK_RU",
)
mv.run(
pd.DataFrame.from_records([{"inputs": text}]),
# function_name="__call__", # Optional
)
Zero-shot 分类¶
“zero-shot-classification (link removed)”任务的管道具有以下输入和输出。
输入¶
sequences:包含要分类的文本的字符串。candidate_labels:包含要应用于文本的标签的字符串列表。
示例:
-----------------------------------------------------------------------------------------
|"sequences" |"candidate_labels" |
-----------------------------------------------------------------------------------------
|I have a problem with Snowflake that needs to be resolved asap!! |[ |
| | "urgent", |
| | "not urgent" |
| |] |
|I have a problem with Snowflake that needs to be resolved asap!! |[ |
| | "English", |
| | "Japanese" |
| |] |
-----------------------------------------------------------------------------------------
输出¶
sequence:输入字符串。labels:表示已应用的标签的字符串列表。scores:每个标签的浮点置信度分数列表。
示例:
--------------------------------------------------------------------------------------------------------------
|"sequence" |"labels" |"scores" |
--------------------------------------------------------------------------------------------------------------
|I have a problem with Snowflake that needs to be resolved asap!! |[ |[ |
| | "urgent", | 0.9952737092971802, |
| | "not urgent" | 0.004726255778223276 |
| |] |] |
|I have a problem with Snowflake that needs to be resolved asap!! |[ |[ |
| | "Japanese", | 0.5790848135948181, |
| | "English" | 0.42091524600982666 |
| |] |] |
--------------------------------------------------------------------------------------------------------------
文本生成¶
任务是“` text-generation <https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextGenerationPipeline (link removed)>`_”的管道,其中 return_tensors 为 False 或未设置,具有以下输入和输出。
备注
文本生成管道,其中 return_tensors 是 True,不受支持。
输入¶
inputs:包含提示的字符串。
示例:
--------------------------------------------------------------------------------
|"inputs" |
--------------------------------------------------------------------------------
|A descendant of the Lost City of Atlantis, who swam to Earth while saying, " |
--------------------------------------------------------------------------------
输出¶
outputs:一个字符串,包含以 JSON 格式表示的结果对象的列表,每个对象都包含包括generated_text的字段。
示例:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"outputs" |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|[{"generated_text": "A descendant of the Lost City of Atlantis, who swam to Earth while saying, \"For my life, I don't know if I'm gonna land upon Earth.\"\n\nIn \"The Misfits\", in a flashback, wh... |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Code Example¶
import transformers
import pandas as pd
model = transformers.pipeline(
task="text-generation",
model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
)
mv = registry.log_model(
model=model,
model_name="TINYLLAMA",
)
text = "A descendant of the Lost City of Atlantis, who swam to Earth while saying,"
mv.run(
pd.DataFrame.from_records([{"inputs": text}]),
# function_name="__call__", # Optional
)
Text generation (OpenAI-compatible)¶
任务是“` text-generation <https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextGenerationPipeline (link removed)>`_”的管道,其中 return_tensors 为 False 或未设置,具有以下输入和输出。
通过在记录模型时提供 snowflake.ml.model.openai_signatures.OPENAI_CHAT_SIGNATURE 签名,该模型将与 OpenAI API 兼容。这允许用户向模型传递 openai.client.ChatCompletion 样式请求。
备注
文本生成管道,其中 return_tensors 是 True,不受支持。
输入¶
messages: A list of dictionaries that contain the messages to be sent to the model.max_completion_tokens:选择使用 时默认使用的角色和仓库。生成的令牌的最大数量。temperature:选择使用 时默认使用的角色和仓库。用于生成的温度。stop:选择使用 时默认使用的角色和仓库。用于生成的停止序列。n:选择使用 时默认使用的角色和仓库。要生成的代数。stream:选择使用 时默认使用的角色和仓库。是否流式传输生成。top_p:选择使用 时默认使用的角色和仓库。用于生成的最高 p 值。frequency_penalty:选择使用 时默认使用的角色和仓库。用于生成的频率损失。presence_penalty:选择使用 时默认使用的角色和仓库。用于生成的存在惩罚。
示例:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| messages | max_completion_tokens | temperature | stop | n | stream | top_p | frequency_penalty | presence_penalty |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| [{'role': 'system', 'content': 'Complete the sentence.'}, {'role': 'user', 'content': [{'type': 'text', 'text': 'A descendant of the Lost City of Atlantis, who swam to Earth while saying, '}]}] | 250 | 0.9 | | 3 | False | 1 | 0.1 | 0.2 |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
输出¶
outputs:一个字符串,包含以 JSON 格式表示的结果对象的列表,每个对象都包含包括generated_text的字段。
示例:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id | object | created | model | choices | usage |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| chatcmpl-... | chat.completion | 1.76912e+09 | /shared/model/model/models/TINYLLAMA/model | [{'finish_reason': 'stop', 'index': 0, 'logprobs': None, 'message': {'content': 'The descendant is not actually ...', 'role': 'assistant'}}] | {'completion_tokens': 399, 'prompt_tokens': 52, 'total_tokens': 451} |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Code Example¶
import transformers
import pandas as pd
from snowflake.ml.model import openai_signatures
model = transformers.pipeline(
task="text-generation",
model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
)
mv = registry.log_model(
model=model,
model_name="TINYLLAMA",
signatures=openai_signatures.OPENAI_CHAT_SIGNATURE,
)
# create a pd.DataFrame with openai.client.chat.completion arguments
x_df = pd.DataFrame.from_records(
[
{
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "Complete the sentence.",
}
],
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "A descendant of the Lost City of Atlantis, who swam to Earth while saying, ",
}
],
},
],
"max_completion_tokens": 250,
"temperature": 0.9,
"stop": None,
"n": 3,
"stream": False,
"top_p": 1.0,
"frequency_penalty": 0.1,
"presence_penalty": 0.2,
}
],
)
# OpenAI Chat Completion compatible output
output_df = mv.run(X=x_df)