Cortex REST API¶

The Cortex REST API gives you access to leading frontier models from Anthropic, OpenAI, Meta, Mistral, and more through your preferred endpoint or SDK. All inference runs within the Snowflake perimeter, so your data remains secure and within your governance boundary. See below on how to get started.

Choose your API¶

Cortex REST API supports two industry-standard API specifications. Pick the one that best fits your stack:


	Chat Completions API	Messages API
Compatibility	OpenAI Chat Completions API (https://platform.openai.com/docs/api-reference/chat/create)	Anthropic Messages API (https://docs.anthropic.com/en/api/messages)
Endpoint	`/api/v2/cortex/v1/chat/completions`	`/api/v2/cortex/v1/messages`
Supported models	All models (OpenAI, Claude, Llama, Mistral, DeepSeek, Snowflake)	Claude models only
SDK support	OpenAI Python and JavaScript SDKs	Anthropic Python SDK
Best for	Most use cases; multi-model flexibility	Existing Anthropic integrations; Anthropic API parity

Both APIs share the same authentication, model catalog, and rate limits. The only difference is the request/response format and which models each endpoint supports. For pricing, see the Snowflake Service Consumption Table.

Quickstart¶

Prerequisites¶

Before you begin, you need:

Your Snowflake account URL (e.g., https://<account-identifier>.snowflakecomputing.cn).
A Snowflake Programmatic Access Token (PAT) for authentication. See 生成编程访问令牌.
A model name to use in requests. See 模型可用性 for available models.

Chat Completions quickstart¶

The Chat Completions API follows the OpenAI specification. You can use the OpenAI SDK directly.

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
)

response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ]
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "How does a snowflake get its unique pattern?" }
  ],
});

console.log(response.choices[0].message.content);

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
    ]
  }'

In the preceding examples, replace the following:

<account-identifier>: Your Snowflake account identifier.
<SNOWFLAKE_PAT>: Your Snowflake Programmatic Access Token (PAT).
model: The model name. See 模型可用性 for supported models.

Messages API quickstart¶

The Messages API follows the Anthropic specification and supports Claude models only.

The Anthropic SDK sends credentials via x-api-key by default, but Snowflake expects a Bearer token. Use an httpx client to set the correct authorization header.

import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

response = client.messages.create(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  messages=[
    {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ],
)

print(response.content[0].text)

Like Python, override the default auth header with a Bearer token via defaultHeaders.

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "How does a snowflake get its unique pattern?" }
  ],
});

console.log(response.content[0].text);

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
    ]
  }'

In the preceding examples, replace the following:

<account-identifier>: Your Snowflake account identifier.
<SNOWFLAKE_PAT>: Your Snowflake Programmatic Access Token (PAT).
model: The Claude model name. See 模型可用性 for supported models.

设置身份验证¶

要向 Cortex REST API 进行身份验证，可以使用使用 Snowflake 对 Snowflake REST APIs 进行身份验证中所述的方法。

Set the Authorization header to include your token (for example, a JSON web token (JWT), OAuth token, or programmatic access token).

小技巧

考虑为 Cortex REST API 请求创建专用用户。

设置授权¶

To send a REST API request, your default role must be granted either the SNOWFLAKE.CORTEX_USER database role or the SNOWFLAKE.CORTEX_REST_API_USER database role. SNOWFLAKE.CORTEX_USER provides access to all Covered AI features including the Cortex REST API, whereas SNOWFLAKE.CORTEX_REST_API_USER provides access only to the Cortex REST API.

In most cases, users already have access because SNOWFLAKE.CORTEX_USER is granted to the PUBLIC role automatically, and all roles inherit PUBLIC.

If your Snowflake administrator has revoked this grant, they must re-grant it:

GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE my_role;
GRANT ROLE my_role TO USER my_user;

重要

REST API requests use the user's default role, so that role must have the necessary privileges. You can change a user's default role with ALTER USER ... SET DEFAULT_ROLE.

ALTER USER my_user SET DEFAULT_ROLE=my_role

Limiting access using the Cortex REST API user role¶

To provide selective access to the Cortex REST API for specific users, use the SNOWFLAKE.CORTEX_REST_API_USER database role. This role grants access to the Cortex REST API without granting access to other Cortex features such as Cortex AI functions, Cortex Agent, Cortex Analyst, Cortex Fine-tuning, or Cortex Search.

CORTEX_REST_API_USER is not granted to the PUBLIC role by default. An account administrator must explicitly grant this role to roles that require access to the Cortex REST API. The SNOWFLAKE.CORTEX_REST_API_USER database role can't be granted directly to a user. For more information, see 使用 SNOWFLAKE 数据库角色.

重要

If your user roles already have the CORTEX_USER role, you must revoke access to the CORTEX_USER role before the CORTEX_REST_API_USER role can take effect as a fine-grained control.

REVOKE DATABASE ROLE SNOWFLAKE.CORTEX_USER FROM ROLE PUBLIC;

To provide access to the Cortex REST API, use the ACCOUNTADMIN role to do the following:

Grant the SNOWFLAKE.CORTEX_REST_API_USER database role to a custom role.
Assign this custom role to users.

The following example creates the custom role cortex_rest_api_role, grants it the CORTEX_REST_API_USER database role, and assigns the role to example_user:

USE ROLE ACCOUNTADMIN;
CREATE ROLE cortex_rest_api_role;
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_REST_API_USER TO ROLE cortex_rest_api_role;

GRANT ROLE cortex_rest_api_role TO USER example_user;

You can also grant access to the Cortex REST API through existing roles. For example, if you have an api_consumer role used by a group of users, you can grant access with a single GRANT statement:

GRANT DATABASE ROLE SNOWFLAKE.CORTEX_REST_API_USER TO ROLE api_consumer;

模型可用性¶

The following tables show the models available in the Cortex REST API for each region:


模型	跨云（任何区域）	AWS Global （跨区域）	AWS US （跨区域）	AWS EU （跨区域）	AWS APJ （跨区域）	Azure Global （跨区域）	Azure US （跨区域）	Azure EU （跨区域）
`claude-opus-4-7`	*	*	*	*
`claude-sonnet-4-6`	✔	✔	✔	✔	✔
`claude-opus-4-6`	✔	✔	✔	✔
`claude-sonnet-4-5`	✔	✔	✔	✔	✔
`claude-opus-4-5`	✔	✔	✔	✔
`claude-haiku-4-5`	✔	✔	✔	✔	✔
`claude-4-sonnet`	✔	✔	✔	✔	✔
`claude-4-opus`	✔	✔
`claude-3-7-sonnet`	✔	✔
`openai-gpt-5.4`	*					*	*
`openai-gpt-5.2`	✔					✔	✔
`openai-gpt-5.1`	✔					✔	✔	✔
`openai-gpt-5`	*					*	*	*
`openai-gpt-5-mini`	*					*	*
`openai-gpt-5-nano`	*					*	*
`openai-gpt-4.1`	✔					✔	✔
`openai-gpt-oss-120b`	*
`llama4-maverick`	✔	✔	✔
`llama3.1-8b`	✔	✔	✔	✔	✔	✔	✔	✔
`llama3.1-70b`	✔	✔	✔	✔	✔	✔	✔	✔
`llama3.1-405b`	✔	✔	✔			✔	✔
`deepseek-r1`	✔	✔	✔
`mistral-7b`	✔	✔
`mistral-large`	✔	✔
`mistral-large2`	✔	✔	✔	✔	✔	✔	✔	✔
`snowflake-llama-3.3-70b`	✔	✔	✔


模型	AWS US 西部 2 （俄勒冈）	AWS US 东部 1 （弗吉尼亚北部）	Azure 东部 US 2 （弗吉尼亚）
`llama4-maverick`	✔
`llama3.1-8b`	✔	✔	✔
`llama3.1-70b`	✔	✔	✔
`llama3.1-405b`	✔	✔	✔
`deepseek-r1`	✔
`mistral-7b`	✔	✔	✔
`mistral-large`	✔	✔	✔
`mistral-large2`	✔	✔	✔
`snowflake-llama-3.3-70b`	✔


模型	AWS 欧洲中部 1 （法兰克福）	AWS 欧洲西部 1 （爱尔兰）	Azure 西欧（荷兰）
`llama3.1-8b`	✔		✔
`llama3.1-70b`	✔	✔	✔
`mistral-7b`	✔		✔
`mistral-large`	✔		✔
`mistral-large2`	✔	✔	✔


模型	AWS AP 东南部 2 （悉尼）	AWS AP 东北部 1 （东京）
`llama3.1-8b`		✔
`llama3.1-70b`	✔	✔
`mistral-7b`		✔
`mistral-large`		✔
`mistral-large2`	✔	✔

* Indicates a preview function or model. Preview features are not suitable for production workloads.

您还可以在任何支持的区域，使用任何微调模型。

Features¶

Streaming¶

Both APIs support streaming responses using server-sent events (https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events).

Chat Completions streaming¶

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
)

response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ],
  stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
});

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "How does a snowflake get its unique pattern?" }
  ],
  stream: true,
});

for await (const event of stream) {
  process.stdout.write(event.choices[0]?.delta?.content || "");
}

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
    ],
    "stream": true,
    "stream_options": {
      "include_usage": true
    }
  }'

Messages API streaming¶

import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

with client.messages.stream(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  messages=[
    {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const stream = client.messages.stream({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "How does a snowflake get its unique pattern?" }
  ],
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
    ]
  }'

Tool calling¶

Tool calling lets the model invoke external functions during a conversation. The flow works in steps:

You send a request with a list of available tools.
The model decides to call one or more tools and returns the tool name and arguments.
You execute the tool on your end.
You send the tool result back, and the model generates a final response.

Tool calling is supported for OpenAI and Claude models.

Chat Completions tool calling¶

import json
from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
)

tools = [
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get the current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          }
        },
        "required": ["location"]
      }
    }
  }
]

messages = [
  {"role": "user", "content": "What is the weather like in San Francisco?"}
]

# Step 1: Send the request with tools
response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=messages,
  tools=tools,
)

# Step 2: The model responds with tool_calls
message = response.choices[0].message

if message.tool_calls:
    tool_call = message.tool_calls[0]

    # Step 3: Execute the tool (your implementation)
    result = json.dumps({"temperature": "69°F", "condition": "sunny"})

    # Step 4: Send the tool result back
    messages.append(message)
    messages.append({
      "role": "tool",
      "tool_call_id": tool_call.id,
      "content": result,
    })

    final_response = client.chat.completions.create(
      model="claude-sonnet-4-5",
      messages=messages,
      tools=tools,
    )

    print(final_response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
});

const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city and state, e.g. San Francisco, CA"
          }
        },
        required: ["location"]
      }
    }
  }
];

const messages = [
  { role: "user", content: "What is the weather like in San Francisco?" }
];

// Step 1: Send the request with tools
const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages,
  tools,
});

// Step 2: The model responds with tool_calls
const message = response.choices[0].message;

if (message.tool_calls) {
  const toolCall = message.tool_calls[0];

  // Step 3: Execute the tool (your implementation)
  const result = JSON.stringify({ temperature: "69°F", condition: "sunny" });

  // Step 4: Send the tool result back
  messages.push(message);
  messages.push({
    role: "tool",
    tool_call_id: toolCall.id,
    content: result,
  });

  const finalResponse = await client.chat.completions.create({
    model: "claude-sonnet-4-5",
    messages,
    tools,
  });

  console.log(finalResponse.choices[0].message.content);
}

Step 1 — Send the request with tools:

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "What is the weather like in San Francisco?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              }
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

The model responds with a tool_calls array:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco, CA\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Step 2 — Execute the tool and send the result back:

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "What is the weather like in San Francisco?"},
      {
        "role": "assistant",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco, CA\"}"
            }
          }
        ]
      },
      {
        "role": "tool",
        "tool_call_id": "call_abc123",
        "content": "{\"temperature\": \"69°F\", \"condition\": \"sunny\"}"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              }
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Messages API tool calling¶

import json
import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

tools = [
  {
    "name": "get_weather",
    "description": "Get the current weather for a location",
    "input_schema": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "The city and state, e.g. San Francisco, CA"
        }
      },
      "required": ["location"]
    }
  }
]

messages = [
  {"role": "user", "content": "What is the weather like in San Francisco?"}
]

# Step 1: Send the request with tools
response = client.messages.create(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  messages=messages,
  tools=tools,
)

# Step 2: The model responds with a tool_use block
if response.stop_reason == "tool_use":
    tool_use = next(b for b in response.content if b.type == "tool_use")

    # Step 3: Execute the tool (your implementation)
    result = json.dumps({"temperature": "69°F", "condition": "sunny"})

    # Step 4: Send the tool result back
    messages.append({"role": "assistant", "content": response.content})
    messages.append({
      "role": "user",
      "content": [
        {
          "type": "tool_result",
          "tool_use_id": tool_use.id,
          "content": result,
        }
      ],
    })

    final_response = client.messages.create(
      model="claude-sonnet-4-5",
      max_tokens=1024,
      messages=messages,
      tools=tools,
    )

    print(final_response.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const tools = [
  {
    name: "get_weather",
    description: "Get the current weather for a location",
    input_schema: {
      type: "object",
      properties: {
        location: {
          type: "string",
          description: "The city and state, e.g. San Francisco, CA"
        }
      },
      required: ["location"]
    }
  }
];

const messages = [
  { role: "user", content: "What is the weather like in San Francisco?" }
];

// Step 1: Send the request with tools
const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages,
  tools,
});

// Step 2: The model responds with a tool_use block
if (response.stop_reason === "tool_use") {
  const toolUse = response.content.find(b => b.type === "tool_use");

  // Step 3: Execute the tool (your implementation)
  const result = JSON.stringify({ temperature: "69°F", condition: "sunny" });

  // Step 4: Send the tool result back
  messages.push({ role: "assistant", content: response.content });
  messages.push({
    role: "user",
    content: [
      {
        type: "tool_result",
        tool_use_id: toolUse.id,
        content: result,
      }
    ],
  });

  const finalResponse = await client.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    messages,
    tools,
  });

  console.log(finalResponse.content[0].text);
}

Step 1 — Send the request with tools:

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is the weather like in San Francisco?"}
    ],
    "tools": [
      {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    ]
  }'

The model responds with a tool_use content block:

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "I'll check the weather for you."
    },
    {
      "type": "tool_use",
      "id": "toolu_abc123",
      "name": "get_weather",
      "input": {"location": "San Francisco, CA"}
    }
  ],
  "stop_reason": "tool_use"
}

Step 2 — Execute the tool and send the result back:

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is the weather like in San Francisco?"},
      {
        "role": "assistant",
        "content": [
          {"type": "text", "text": "I'\''ll check the weather for you."},
          {
            "type": "tool_use",
            "id": "toolu_abc123",
            "name": "get_weather",
            "input": {"location": "San Francisco, CA"}
          }
        ]
      },
      {
        "role": "user",
        "content": [
          {
            "type": "tool_result",
            "tool_use_id": "toolu_abc123",
            "content": "{\"temperature\": \"69°F\", \"condition\": \"sunny\"}"
          }
        ]
      }
    ],
    "tools": [
      {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    ]
  }'

Structured output¶

You can request structured JSON output that conforms to a specific schema. This is supported for OpenAI and Claude models through the Chat Completions API. For the Messages API, use the tool_use pattern to enforce structured output.

Chat Completions structured output¶

Use the response_format field with a JSON schema to constrain the model's output.

import json
from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
)

response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {"role": "user", "content": "Create a dataset of 3 people with their names and ages."}
  ],
  response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "people_data",
      "schema": {
        "type": "object",
        "properties": {
          "people": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "name": {"type": "string"},
                "age": {"type": "number"}
              },
              "required": ["name", "age"]
            }
          }
        },
        "required": ["people"]
      }
    }
  }
)

data = json.loads(response.choices[0].message.content)
print(data)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    { role: "user", content: "Create a dataset of 3 people with their names and ages." }
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "people_data",
      schema: {
        type: "object",
        properties: {
          people: {
            type: "array",
            items: {
              type: "object",
              properties: {
                name: { type: "string" },
                age: { type: "number" }
              },
              required: ["name", "age"]
            }
          }
        },
        required: ["people"]
      }
    }
  }
});

const data = JSON.parse(response.choices[0].message.content);
console.log(data);

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "Create a dataset of 3 people with their names and ages."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "people_data",
        "schema": {
          "type": "object",
          "properties": {
            "people": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name": {"type": "string"},
                  "age": {"type": "number"}
                },
                "required": ["name", "age"]
              }
            }
          },
          "required": ["people"]
        }
      }
    }
  }'

备注

Claude models support only json_schema as the response format type. OpenAI models support additional response format types as documented in the OpenAI API reference (https://platform.openai.com/docs/api-reference/chat/create).

Messages API structured output¶

The Messages API does not have a response_format field. Instead, define a tool with your desired output schema and instruct the model to use it. The model's tool_use response will contain structured JSON matching your schema.

import json
import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

response = client.messages.create(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  messages=[
    {"role": "user", "content": "Create a dataset of 3 people with their names and ages."}
  ],
  tools=[
    {
      "name": "people_data",
      "description": "Output a list of people with names and ages.",
      "input_schema": {
        "type": "object",
        "properties": {
          "people": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "name": {"type": "string"},
                "age": {"type": "number"}
              },
              "required": ["name", "age"]
            }
          }
        },
        "required": ["people"]
      }
    }
  ],
  tool_choice={"type": "tool", "name": "people_data"},
)

# Extract the structured data from the tool_use block
tool_use = next(b for b in response.content if b.type == "tool_use")
print(tool_use.input)

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "Create a dataset of 3 people with their names and ages." }
  ],
  tools: [
    {
      name: "people_data",
      description: "Output a list of people with names and ages.",
      input_schema: {
        type: "object",
        properties: {
          people: {
            type: "array",
            items: {
              type: "object",
              properties: {
                name: { type: "string" },
                age: { type: "number" }
              },
              required: ["name", "age"]
            }
          }
        },
        required: ["people"]
      }
    }
  ],
  tool_choice: { type: "tool", name: "people_data" },
});

// Extract the structured data from the tool_use block
const toolUse = response.content.find(b => b.type === "tool_use");
console.log(toolUse.input);

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Create a dataset of 3 people with their names and ages."}
    ],
    "tools": [
      {
        "name": "people_data",
        "description": "Output a list of people with names and ages.",
        "input_schema": {
          "type": "object",
          "properties": {
            "people": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name": {"type": "string"},
                  "age": {"type": "number"}
                },
                "required": ["name", "age"]
              }
            }
          },
          "required": ["people"]
        }
      }
    ],
    "tool_choice": {"type": "tool", "name": "people_data"}
  }'

Image input¶

You can include images in your requests for models that support vision. Images must be provided as base64-encoded strings. Images are limited to 20 per conversation with a 20 MiB max request size.

Image input is supported for:

Claude models (claude-3-7-sonnet and newer)
OpenAI models (openai-gpt-4.1, openai-gpt-5, openai-gpt-5-mini, openai-gpt-5-nano)

Chat Completions image input¶

import base64
from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
)

# Read and encode an image file
with open("image.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/png;base64,{image_data}"
          }
        },
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)

print(response.choices[0].message.content)

import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
});

// Read and encode an image file
const imageData = fs.readFileSync("image.png").toString("base64");

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "image_url",
          image_url: {
            url: `data:image/png;base64,${imageData}`
          }
        },
        {
          type: "text",
          text: "What is in this image?"
        }
      ]
    }
  ],
});

console.log(response.choices[0].message.content);

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,<BASE64_IMAGE_DATA>"
            }
          },
          {
            "type": "text",
            "text": "What is in this image?"
          }
        ]
      }
    ]
  }'

Messages API image input¶

The Messages API uses a different image format — a source block with type, media_type, and data fields instead of a data URL.

import base64
import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

# Read and encode an image file
with open("image.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.messages.create(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/png",
            "data": image_data
          }
        },
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ],
)

print(response.content[0].text)

import Anthropic from "@anthropic-ai/sdk";
import fs from "fs";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

// Read and encode an image file
const imageData = fs.readFileSync("image.png").toString("base64");

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "image",
          source: {
            type: "base64",
            media_type: "image/png",
            data: imageData
          }
        },
        {
          type: "text",
          text: "What is in this image?"
        }
      ]
    }
  ],
});

console.log(response.content[0].text);

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image",
            "source": {
              "type": "base64",
              "media_type": "image/png",
              "data": "<BASE64_IMAGE_DATA>"
            }
          },
          {
            "type": "text",
            "text": "What is in this image?"
          }
        ]
      }
    ]
  }'

Prompt caching¶

Prompt caching lets you reuse previously processed context (such as large system prompts, documents, or conversation history) across requests, reducing latency and cost.

OpenAI models: Caching is implicit. Prompts with 1,024+ tokens are automatically cached — no request changes needed.
Claude models: Caching is explicit. Add cache_control breakpoints to content blocks you want cached. Only the ephemeral cache type is supported, with either a 5-minute or 1-hour TTL. A maximum of 4 cache breakpoints per request.

Chat Completions prompt caching¶

For Claude models via Chat Completions, add cache_control to content blocks. OpenAI models are cached automatically and do not require this field.

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
)

response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "<long system prompt to cache>",
          "cache_control": {"type": "ephemeral"}
        }
      ]
    },
    {"role": "user", "content": "Summarize the key points."}
  ]
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    {
      role: "system",
      content: [
        {
          type: "text",
          text: "<long system prompt to cache>",
          cache_control: { type: "ephemeral" }
        }
      ]
    },
    { role: "user", content: "Summarize the key points." }
  ],
});

console.log(response.choices[0].message.content);

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "<long system prompt to cache>",
            "cache_control": {"type": "ephemeral"}
          }
        ]
      },
      {"role": "user", "content": "Summarize the key points."}
    ]
  }'

Messages API prompt caching¶

Use cache_control on system or user content blocks. Only the ephemeral cache type is supported, with either a 5-minute or 1-hour TTL. Set the TTL using the ttl field (e.g., "cache_control": {"type": "ephemeral", "ttl": "1h"}). If omitted, the default TTL is 5 minutes. A maximum of 4 cache breakpoints can be set per request.

import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

response = client.messages.create(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  system=[
    {
      "type": "text",
      "text": "<long system prompt to cache>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  messages=[
    {"role": "user", "content": "Summarize the key points."}
  ],
)

print(response.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "<long system prompt to cache>",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    { role: "user", content: "Summarize the key points." }
  ],
});

console.log(response.content[0].text);

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "system": [
      {
        "type": "text",
        "text": "<long system prompt to cache>",
        "cache_control": {"type": "ephemeral"}
      }
    ],
    "messages": [
      {"role": "user", "content": "Summarize the key points."}
    ]
  }'

备注

Anthropic prompt caching supports a 5-minute (default) or 1-hour TTL. To use the 1-hour TTL, set "cache_control": {"type": "ephemeral", "ttl": "1h"}. Cached content not accessed within the TTL window is evicted. OpenAI prompt caching is implicit and managed automatically — no cache_control fields needed.

Thinking and reasoning¶

Chat Completions thinking¶

For Claude models, use the reasoning object. For OpenAI reasoning models, use the reasoning_effort field (values: minimal, low, medium, high).

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
)

# Claude models — use the reasoning object
response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}
  ],
  extra_body={
    "reasoning": {"effort": "high"}
  }
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1"
});

// Claude models — use the reasoning object
const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    { role: "user", content: "Are there an infinite number of prime numbers such that n mod 4 == 3?" }
  ],
  reasoning: { effort: "high" },
});

console.log(response.choices[0].message.content);

# Claude models — use the reasoning object
curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}
    ],
    "reasoning": {
      "effort": "high"
    }
  }'

# OpenAI reasoning models — use reasoning_effort
curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "openai-gpt-5",
    "messages": [
      {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}
    ],
    "reasoning_effort": "high"
  }'

Messages API thinking¶

Some Claude models support adaptive thinking, where the model adjusts how much reasoning it applies based on task complexity. The following models support adaptive thinking:

claude-opus-4-6

For the Messages API, use the thinking parameter with type: "adaptive" to enable adaptive thinking. The output_config.effort parameter provides some high-level control over the thinking depth, and accepts the following values:


Effort level	Behavior
`max`	Always thinks with no constraints on thinking depth. Claude Opus 4.6 only.
`high` (default)	Always thinks. Provides deep reasoning on complex tasks.
`medium`	Moderate thinking. May skip thinking for very simple queries.
`low`	Minimizes thinking. Skips thinking for simple tasks where speed matters most.

The following examples demonstrate how to make a Messages API call with adaptive thinking enabled:

import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

response = client.messages.create(
  model="claude-opus-4-6",
  max_tokens=16384,
  thinking={
    "type": "adaptive"
  },
  messages=[
    {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}
  ],
)

# The response includes thinking blocks followed by text
for block in response.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:100]}...")
    elif block.type == "text":
        print(f"Answer: {block.text}")

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const response = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 16384,
  thinking: {
    type: "adaptive"
  },
  messages: [
    { role: "user", content: "Are there an infinite number of prime numbers such that n mod 4 == 3?" }
  ],
});

// The response includes thinking blocks followed by text
for (const block of response.content) {
  if (block.type === "thinking") {
    console.log(`Thinking: ${block.thinking.slice(0, 100)}...`);
  } else if (block.type === "text") {
    console.log(`Answer: ${block.text}`);
  }
}

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-6",
    "max_tokens": 16384,
    "thinking": {
      "type": "adaptive"
    },
    "messages": [
      {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}
    ]
  }'

The response includes thinking blocks with summarized thinking and thinking signatures. Pass these blocks back in multi-turn conversations to maintain reasoning context:

{
  "role": "assistant",
  "content": [
    {"type": "thinking", "thinking": "<thinking>", "signature": "<signature>"},
    {"type": "text", "text": "Yes, there are infinitely many primes p where p ≡ 3 (mod 4)..."}
  ]
}

For a full description of the Messages API support for Adaptive Thinking, see Claude API Docs -- Adaptive thinking (https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking).

Beta features (Messages API)¶

The Messages API supports Anthropic beta features via the anthropic-beta header. Pass one or more beta header values as a comma-separated string.

Supported beta headers¶
Beta header value	Feature
`token-efficient-tools-2025-02-19`	Token-efficient tools
`interleaved-thinking-2025-05-14`	Interleaved thinking
`output-128k-2025-02-19`	Enables output tokens up to 128K
`dev-full-thinking-2025-05-14`	Developer mode for raw thinking on Claude 4+ models
`context-1m-2025-08-07`	1 million token context window
`context-management-2025-06-27`	Context management
`effort-2025-11-24`	Effort parameter for thinking
`tool-search-tool-2025-10-19`	Tool search tool
`tool-examples-2025-10-29`	Tool use examples

The following example enables the 1 million token context window with claude-sonnet-4-6:

import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  http_client=http_client,
  default_headers={
    "Authorization": f"Bearer {PAT}",
    "anthropic-beta": "context-1m-2025-08-07",
  },
)

response = client.messages.create(
  model="claude-sonnet-4-6",
  max_tokens=8192,
  messages=[
    {"role": "user", "content": "<very long document text>... Summarize the key themes."}
  ],
)

print(response.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
    "anthropic-beta": "context-1m-2025-08-07",
  },
});

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 8192,
  messages: [
    { role: "user", content: "<very long document text>... Summarize the key themes." }
  ],
});

console.log(response.content[0].text);

curl "https://<account-identifier>.snowflakecomputing.cn/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: context-1m-2025-08-07" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 8192,
    "messages": [
      {"role": "user", "content": "<very long document text>... Summarize the key themes."}
    ]
  }'

You can combine multiple beta features by passing a comma-separated string:

-H "anthropic-beta: context-1m-2025-08-07,interleaved-thinking-2025-05-14"

Chat Completions API reference¶

POST /api/v2/cortex/v1/chat/completions¶

Generates a chat completion using the specified model. The request and response format follows the OpenAI Chat Completions API specification (https://platform.openai.com/docs/api-reference/chat/create).

POST https://<account_identifier>.snowflakecomputing.cn/api/v2/cortex/v1/chat/completions

必要标头¶

Authorization: Bearer token: Authorization for the request. token is a JSON web token (JWT), OAuth token, or programmatic access token. For details, see 使用 Snowflake 对 Snowflake REST APIs 进行身份验证.
Content-Type: application/json: 指定请求的主体是 JSON 格式。

可选标头¶

X-Snowflake-Authorization-Token-Type: type

定义授权令牌的类型。

如果您省略了 X-Snowflake-Authorization-Token-Type 标头，Snowflake 将通过检查令牌来确定令牌类型。

即使此标头可选，您也可以选择指定此标头。您可以将标头设置为以下值之一：

``KEYPAIR_JWT``（适用于密钥对身份验证）
``OAUTH``（适用于 OAuth）
``PROGRAMMATIC_ACCESS_TOKEN``（适用于编程访问令牌）

Accept: application/json, text/event-stream

指定响应将包含 JSON（错误情况）或服务器发送的事件。

Required JSON fields¶


字段	类型	描述
`model`	字符串	The model to use (see 模型可用性). You may also use the fully-qualified name of any fine-tuned model in the format `database.schema.model`.
`messages`	数组	An array of message objects representing the conversation. Each message must have a `role` (`system`, `user`, `assistant`, or `tool`) and `content` (string or array of content parts).

Commonly used optional JSON fields¶


字段	类型	默认值	描述
`max_completion_tokens`	整数	4096	Maximum tokens in the response. Theoretical maximum is 131,072; each model has its own output limit.
`temperature`	数字	Varies by model	Controls randomness. Values from 0 to 2.
`top_p`	数字	1.0	Controls diversity via nucleus sampling.
`stream`	boolean	false	Whether to stream back partial progress as server-sent events.
`tools`	数组	null	A list of tools the model may call. Each tool must have `type: "function"` and a `function` object with `name`, `description`, and `parameters`.
`tool_choice`	string or object	`"auto"`	Controls how the model selects tools. Options: `"auto"`, `"required"`, `"none"`, or an object specifying a particular function.
`response_format`	对象	null	Constrains the output format. Use `{"type": "json_schema", "json_schema": {...}}` for structured output.
`reasoning_effort`	字符串	null	For OpenAI reasoning models. Values: `"minimal"`, `"low"`, `"medium"`, `"high"`.
`reasoning`	对象	null	For Claude models. Set `reasoning.effort` or `reasoning.max_tokens` to enable thinking.

See the detailed compatibility chart for the full list of supported fields per model family.

状态代码¶

200 OK: Request completed successfully.
400 invalid options object: 可选实参有无效值。
400 unknown model model_name: 指定的模型不存在。
400 schema validation failed: The response schema structure is incorrect.
400 max tokens of count exceeded: The request exceeded the maximum number of tokens supported by the model.
400 all requests were throttled by remote service: The request has been throttled. Try again later.
402 budget exceeded: 超出了模型消耗预算。
403 Not Authorized: 账户未启用 REST API，或调用用户的默认角色没有 snowflake.cortex_user 数据库角色。
429 too many requests: The usage quota has been exceeded. Try again later.
503 inference timed out: 请求耗时太长。

Limitations¶

If unset, max_completion_tokens defaults to 4096. Each model has its own output token limit.
Tool calling is supported for OpenAI and Claude models only.
Audio is not supported.
Image understanding is supported for OpenAI and Claude models only. Images are limited to 20 per conversation with a 20 MiB max request size.
Only Claude models support ephemeral cache control points for prompt caching. OpenAI models support implicit caching.
Only Claude models support returning reasoning details in the response.
max_tokens is deprecated. Use max_completion_tokens instead.
Error messages are generated by Snowflake, not by the model provider.

Detailed compatibility chart¶

The following tables summarize which request and response fields are supported when using the Chat Completions API with different Snowflake-hosted model families.

Request fields¶
字段	OpenAI Models	Claude Models	Other Models
`model`	✔ Supported	✔ Supported	✔ Supported
`messages`	See sub-fields	See sub-fields	See sub-fields
`messages[].audio`	❌ Error	❌ Ignored	❌ Ignored
`messages[].role`	✔ Supported	✔ Only user/assistant/system	✔ Only user/assistant/system
`messages[].content` (string)	✔ Supported	✔ Supported	✔ Supported
`messages[].content[]` (array)	See sub-fields	See sub-fields	See sub-fields
`messages[].content[].text`	✔ Supported	✔ Supported	✔ Supported
`messages[].content[].type`	✔ Supported	✔ Supported	✔ Supported
`messages[].content[].image_url`	✔ Supported	✔ Supported	❌ Error
`messages[].content[].cache_control`	❌ Ignored	✔ Supported (ephemeral only)	❌ Ignored
`messages[].content[].file`	❌ Error	❌ Error	❌ Ignored
`messages[].content[].input_audio`	❌ Error	❌ Ignored	❌ Ignored
`messages[].content[].refusal`	✔ Supported	❌ Ignored	❌ Ignored
`messages[].function_call`	✔ Supported (deprecated)	❌ Ignored	❌ Ignored
`messages[].name`	✔ Supported	❌ Ignored	❌ Ignored
`messages[].refusal`	✔ Supported	❌ Ignored	❌ Ignored
`messages[].tool_call_id`	✔ Supported	✔ Supported	❌ Ignored
`messages[].tool_calls`	✔ Supported	✔ Only `function` tools	❌ Ignored
`messages[].reasoning_details`	❌ Ignored	✔ OpenRouter format `reasoning.text`	❌ Ignored
`audio`	❌ Error	❌ Ignored	❌ Ignored
`frequency_penalty`	✔ Supported	❌ Ignored	❌ Ignored
`logit_bias`	✔ Supported	❌ Ignored	❌ Ignored
`logprobs`	✔ Supported	❌ Ignored	❌ Ignored
`max_tokens`	❌ Error (deprecated)	❌ Error (deprecated)	❌ Error (deprecated)
`max_completion_tokens`	✔ Supported (4096 default, 131072 max)	✔ Supported (4096 default, 131072 max)	✔ Supported (4096 default, 131072 max)
`metadata`	❌ Ignored	❌ Ignored	❌ Ignored
`modalities`	❌ Ignored	❌ Ignored	❌ Ignored
`n`	✔ Supported	❌ Ignored	❌ Ignored
`parallel_tool_calls`	✔ Supported	❌ Ignored	❌ Ignored
`prediction`	✔ Supported	❌ Ignored	❌ Ignored
`presence_penalty`	✔ Supported	❌ Ignored	❌ Ignored
`prompt_cache_key`	✔ Supported	❌ Ignored	❌ Ignored
`reasoning_effort`	✔ Supported	❌ Ignored (use `reasoning` object)	❌ Ignored
`reasoning`	See sub-fields	See sub-fields	See sub-fields
`reasoning.effort`	✔ Supported (overrides `reasoning_effort`)	✔ Converted to `reasoning.max_tokens`	❌ Ignored
`reasoning.max_tokens`	❌ Ignored	✔ Supported	❌ Ignored
`response_format`	✔ Supported	✔ Only `json_schema`	❌ Ignored
`safety_identifier`	❌ Ignored	❌ Ignored	❌ Ignored
`service_tier`	❌ Error	❌ Error	❌ Error
`stop`	✔ Supported	❌ Ignored	❌ Ignored
`store`	❌ Error	❌ Error	❌ Error
`stream`	✔ Supported	✔ Supported	✔ Supported
`stream_options`	See sub-fields	See sub-fields	See sub-fields
`stream_options.include_obfuscation`	❌ Ignored	❌ Ignored	❌ Ignored
`stream_options.include_usage`	✔ Supported	✔ Supported	✔ Supported
`temperature`	✔ Supported	✔ Supported	✔ Supported
`tool_choice`	✔ Supported	✔ Only `function` tools	❌ Ignored
`tools`	✔ Supported	✔ Only `function` tools	❌ Error
`top_logprobs`	✔ Supported	❌ Ignored	❌ Ignored
`top_p`	✔ Supported	✔ Supported	✔ Supported
`verbosity`	✔ Supported	❌ Ignored	❌ Ignored
`web_search_options`	❌ Error	❌ Ignored	❌ Ignored

Response fields¶
字段	OpenAI Models	Claude Models	Other Models
`id`	✔ Supported	✔ Supported	✔ Supported
`object`	✔ Supported	✔ Supported	✔ Supported
`created`	✔ Supported	✔ Supported	✔ Supported
`model`	✔ Supported	✔ Supported	✔ Supported
`choices`	See sub-fields	See sub-fields	See sub-fields
`choices[].index`	✔ Supported	✔ Single choice only	✔ Single choice only
`choices[].finish_reason`	✔ Supported	❌ Not supported	✔ Only `stop`
`choices[].logprobs`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].message` (non-streaming)	See sub-fields	See sub-fields	See sub-fields
`choices[].message.content`	✔ Supported	✔ Supported	✔ Supported
`choices[].message.role`	✔ Supported	✔ Supported	✔ Supported
`choices[].message.refusal`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].message.annotations`	❌ Not supported	❌ Not supported	❌ Not supported
`choices[].message.audio`	❌ Not supported	❌ Not supported	❌ Not supported
`choices[].message.function_call`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].message.tool_calls`	✔ Supported	✔ Only `function` tools	❌ Not supported
`choices[].message.reasoning`	❌ Not supported	✔ OpenRouter format	❌ Not supported
`choices[].delta` (streaming)	See sub-fields	See sub-fields	See sub-fields
`choices[].delta.content`	✔ Supported	✔ Supported	✔ Supported
`choices[].delta.role`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].delta.refusal`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].delta.function_call`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].delta.tool_calls`	✔ Supported	✔ Only `function` tools	❌ Not supported
`choices[].delta.reasoning`	❌ Not supported	✔ OpenRouter format	❌ Not supported
`usage`	See sub-fields	See sub-fields	See sub-fields
`usage.prompt_tokens`	✔ Supported	✔ Supported	✔ Supported
`usage.completion_tokens`	✔ Supported	✔ Supported	✔ Supported
`usage.total_tokens`	✔ Supported	✔ Supported	✔ Supported
`usage.prompt_tokens_details`	See sub-fields	See sub-fields	See sub-fields
`usage.prompt_tokens_details.audio_tokens`	❌ Not supported	❌ Not supported	❌ Not supported
`usage.prompt_tokens_details.cached_tokens`	✔ Only cache reads	✔ Cache read + write	❌ Not supported
`usage.completion_tokens_details`	See sub-fields	See sub-fields	See sub-fields
`usage.completion_tokens_details.accepted_prediction_tokens`	✔ Supported	❌ Not supported	❌ Not supported
`usage.completion_tokens_details.audio_tokens`	❌ Not supported	❌ Not supported	❌ Not supported
`usage.completion_tokens_details.reasoning_tokens`	✔ Supported	❌ Not supported	❌ Not supported
`usage.completion_tokens_details.rejected_prediction_tokens`	✔ Supported	❌ Not supported	❌ Not supported
`service_tier`	✔ Supported	❌ Not supported	❌ Not supported
`system_fingerprint`	✔ Supported	❌ Not supported	❌ Not supported

Request headers¶
Header	Support
`Authorization`	✔ Required
`Content-Type`	✔ Supported (`application/json`)
`Accept`	✔ Supported (`application/json`, `text/event-stream`)

Response headers¶
Header	Support
`openai-organization`	❌ Not supported
`openai-version`	❌ Not supported
`openai-processing-ms`	❌ Not supported
`x-ratelimit-limit-requests`	❌ Not supported
`x-ratelimit-limit-tokens`	❌ Not supported
`x-ratelimit-remaining-requests`	❌ Not supported
`x-ratelimit-remaining-tokens`	❌ Not supported
`x-ratelimit-reset-requests`	❌ Not supported
`x-ratelimit-reset-tokens`	❌ Not supported
`retry-after`	❌ Not supported

Learn more¶

For additional usage examples, see the OpenAI Chat Completions API reference (https://platform.openai.com/docs/guides/completions/) or the OpenAI Cookbook (https://cookbook.openai.com/).

In addition to providing compatibility with the Chat Completions API, Snowflake supports OpenRouter-compatible features for Claude models. These features are exposed as extra fields on the request:

For prompt caching, use the cache_control field. See the OpenRouter prompt caching documentation (https://openrouter.ai/docs/features/prompt-caching).
For reasoning tokens, use the reasoning field. See the OpenRouter reasoning documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens).

Messages API reference¶

POST /api/v2/cortex/v1/messages¶

Generates a response using a Claude model. The request and response format follows the Anthropic Messages API specification (https://docs.anthropic.com/en/api/messages).

POST https://<account_identifier>.snowflakecomputing.cn/api/v2/cortex/v1/messages

备注

The Messages API supports Claude models only. For other models, use the Chat Completions API.

必要标头¶

Authorization: Bearer token: Authorization for the request. token is a JSON web token (JWT), OAuth token, or programmatic access token. For details, see 使用 Snowflake 对 Snowflake REST APIs 进行身份验证.
Content-Type: application/json: 指定请求的主体是 JSON 格式。
anthropic-version: 2023-06-01: Required Anthropic API version header.

可选标头¶

X-Snowflake-Authorization-Token-Type: type

定义授权令牌的类型。

如果您省略了 X-Snowflake-Authorization-Token-Type 标头，Snowflake 将通过检查令牌来确定令牌类型。

即使此标头可选，您也可以选择指定此标头。您可以将标头设置为以下值之一：

``KEYPAIR_JWT``（适用于密钥对身份验证）
``OAUTH``（适用于 OAuth）
``PROGRAMMATIC_ACCESS_TOKEN``（适用于编程访问令牌）

anthropic-beta: feature

Enables beta features. Only Bedrock-compatible beta headers are supported.

Required JSON fields¶


字段	类型	描述
`model`	字符串	The Claude model to use (see 模型可用性).
`max_tokens`	整数	The maximum number of tokens to generate.
`messages`	数组	An array of message objects. Each message has a `role` (`user` or `assistant`) and `content` (string or array of content blocks).

Supported features¶

The Messages API supports the standard Anthropic Messages API feature set for Claude models, including:

Text generation and multi-turn conversations
Streaming ("stream": true)
System messages (via top-level system field)
Tool calling (Anthropic format with name, description, input_schema)
Image input (base64 source blocks)
Prompt caching (cache_control on content blocks)
Extended thinking (thinking parameter with budget_tokens)

For full request and response schema details, see the Anthropic Messages API documentation (https://docs.anthropic.com/en/api/messages).

Limitations¶

Claude models only. OpenAI, Llama, Mistral, and other models are not available through this endpoint.
No flex processing or priority tier. The service_tier field is not supported.
Bedrock beta headers only. Only Bedrock-compatible anthropic-beta header values are supported.
Error messages are generated by Snowflake, not by Anthropic.

状态代码¶

200 OK: Request completed successfully.
400 invalid_request_error: The request body is malformed or contains invalid values.
400 unknown model model_name: The specified model does not exist or is not a Claude model.
402 budget exceeded: 超出了模型消耗预算。
403 Not Authorized: Account not enabled for REST API, or the default role does not have the snowflake.cortex_user database role.
429 too many requests: The usage quota has been exceeded. Try again later.
503 inference timed out: 请求耗时太长。

Rate limits¶

To ensure high performance for all Snowflake customers, Cortex REST API requests are subject to rate limits. Requests exceeding the limits may receive an HTTP 429 response. Snowflake may occasionally adjust these limits.

The default limits in the following tables are applied per account and independently for each model. Ensure your application handles 429 responses gracefully by retrying with exponential backoff (https://platform.openai.com/docs/guides/rate-limits#retrying-with-exponential-backoff).

If you need to increase the limits, contact Snowflake Support.

Cortex REST API 速率限制¶
模型	每分钟处理的词元 (TPM)	每分钟的请求 (RPM)	最大输出（词元）
`claude-3-7-sonnet`	300,000	30	16,384
`claude-sonnet-4-5`	600,000	60	16,384
`claude-haiku-4-5`	600,000	60	16,384
`claude-4-sonnet`	300,000	30	16,384
`claude-4-opus`	75,000	75	16,384
`deepseek-r1`	100,000	100	16,384
`llama3.1-8b`	400,000	400	16,384
`llama3.1-70b`	200,000	200	16,384
`llama3.1-405b`	100,000	100	16,384
`mistral-7b`	400,000	400	16,384
`mistral-large2`	200,000	200	16,384
`openai-gpt-4.1`	300,000	30	16,384
`openai-gpt-5`	300,000	30	16,384
`openai-gpt-5-mini`	1,000,000	1,000	16,384
`openai-gpt-5-nano`	5,000,000	5,000	16,384

通过跨区域推理提高速率限制¶

If you set up cross-region inference in your Snowflake Account, the rate limits are higher for the following models:

跨区域推理的 Cortex REST API 速率限制¶
模型	每分钟处理的词元 (TPM)	每分钟的请求 (RPM)	最大输出（词元）
`claude-3-7-sonnet`	600,000	60	16,384
`claude-haiku-4-5`	600,000	60	16,384
`claude-sonnet-4-5`	600,000	60	16,384
`claude-4-sonnet`	1,200,000	200	16,384
`claude-4-opus`	150,000	50	16,384
`llama3.1-8b`	800,000	400	16,384
`llama3.1-70b`	400,000	200	16,384
`llama3.1-405b`	200,000	100	16,384

Troubleshooting rate limit events¶

Offending either the TPM or RPM limits will result in a 429 response code. If your REST API usage is below the request per minute rate limit but still received a 429 response code, double check the token usage rate.

Cortex REST API implements rate limits using the Sliding Window Counter (https://blog.cloudflare.com/counting-things-a-lot-of-different-things/#sliding-windows-to-the-rescue) pattern. The counters are stored in a highly-available Redis cluster only accessible by Snowflake Cortex within Snowflake's private network.

The sliding-window counter assumes that client traffic to the API in the previous time window is uniformly distributed. When traffic is spiky, this assumption could overestimate the rate of requests, but recovers quickly given the window is short. Please contact Snowflake Support if you are subject to the overestimation and want to increase the limits.

Known issues¶

Session token expiration¶

We recommended authenticating with one of the three methods defined in 使用 Snowflake 对 Snowflake REST APIs 进行身份验证. However, if you choose to authenticate with a Snowflake session token, you must handle token refresh to ensure uninterrupted API access.

Session tokens expire periodically. If a request is executed with an expired session token, the REST API returns a 200 OK response that includes error code 390112. When this occurs, the operation is not performed.

To handle this behavior, your application should:

Check each API response for error code 390112, even when the HTTP status code is 200 OK.
When error code 390112 is detected, refresh the session token and retry the request.

备注

This behavior only affects applications using Snowflake session tokens. If you authenticate using key pair authentication, OAuth, or programmatic access tokens (PATs), you do not need to implement this error handling.

成本注意事项¶

Snowflake Cortex REST API requests incur compute costs based on the number of tokens processed. Refer to the Snowflake Service Consumption Table for each model's cost in dollars per million tokens.

A token is the smallest unit of text processed by Snowflake Cortex LLM functions, approximately equal to four characters of text. The equivalence of raw input or output text to tokens can vary by model.

Both input and output tokens incur compute cost. If you use the API to provide a conversational or chat user experience, all previous prompts and responses are processed to generate each new response, with corresponding costs.