/responses[Beta]

LiteLLM 提供符合 OpenAI 的 /responses API 规范的 BETA 端点

功能	支持	备注
成本跟踪	✅	适用于所有支持的模型
日志记录	✅	适用于所有集成
终端用户跟踪	✅
流式传输	✅
回退	✅	在支持的模型之间工作
负载均衡	✅	在支持的模型之间工作
支持的操作	创建响应、获取响应、删除响应
支持的 LiteLLM 版本	1.63.8+
支持的 LLM 提供商	所有 LiteLLM 支持的提供商	`openai`, `anthropic`, `bedrock`, `vertex_ai`, `gemini`, `azure`, `azure_ai` 等

用法

LiteLLM Python SDK

OpenAI
Anthropic
Vertex AI
AWS Bedrock
Google AI Studio

非流式传输

OpenAI 非流式传输响应
import litellm

# Non-streaming response
response = litellm.responses(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

流式传输

OpenAI 流式传输响应
import litellm

# Streaming response
response = litellm.responses(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

GET 响应

按 ID 获取响应
import litellm

# First, create a response
response = litellm.responses(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

# Get the response ID
response_id = response.id

# Retrieve the response by ID
retrieved_response = litellm.get_responses(
    response_id=response_id
)

print(retrieved_response)

# For async usage
# retrieved_response = await litellm.aget_responses(response_id=response_id)

DELETE 响应

按 ID 删除响应
import litellm

# First, create a response
response = litellm.responses(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

# Get the response ID
response_id = response.id

# Delete the response by ID
delete_response = litellm.delete_responses(
    response_id=response_id
)

print(delete_response)

# For async usage
# delete_response = await litellm.adelete_responses(response_id=response_id)

非流式传输

Anthropic 非流式传输响应
import litellm
import os

# Set API key
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-api-key"

# Non-streaming response
response = litellm.responses(
    model="anthropic/claude-3-5-sonnet-20240620",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

流式传输

Anthropic 流式传输响应
import litellm
import os

# Set API key
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-api-key"

# Streaming response
response = litellm.responses(
    model="anthropic/claude-3-5-sonnet-20240620",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

非流式传输

Vertex AI 非流式传输响应
import litellm
import os

# Set credentials - Vertex AI uses application default credentials
# Run 'gcloud auth application-default login' to authenticate
os.environ["VERTEXAI_PROJECT"] = "your-gcp-project-id"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

# Non-streaming response
response = litellm.responses(
    model="vertex_ai/gemini-1.5-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

流式传输

Vertex AI 流式传输响应
import litellm
import os

# Set credentials - Vertex AI uses application default credentials
# Run 'gcloud auth application-default login' to authenticate
os.environ["VERTEXAI_PROJECT"] = "your-gcp-project-id"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

# Streaming response
response = litellm.responses(
    model="vertex_ai/gemini-1.5-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

非流式传输

AWS Bedrock 非流式传输响应
import litellm
import os

# Set AWS credentials
os.environ["AWS_ACCESS_KEY_ID"] = "your-access-key-id"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-secret-access-key"
os.environ["AWS_REGION_NAME"] = "us-west-2"  # or your AWS region

# Non-streaming response
response = litellm.responses(
    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

流式传输

AWS Bedrock 流式传输响应
import litellm
import os

# Set AWS credentials
os.environ["AWS_ACCESS_KEY_ID"] = "your-access-key-id"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-secret-access-key"
os.environ["AWS_REGION_NAME"] = "us-west-2"  # or your AWS region

# Streaming response
response = litellm.responses(
    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

非流式传输

Google AI Studio 非流式传输响应
import litellm
import os

# Set API key for Google AI Studio
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

# Non-streaming response
response = litellm.responses(
    model="gemini/gemini-1.5-flash",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

流式传输

Google AI Studio 流式传输响应
import litellm
import os

# Set API key for Google AI Studio
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

# Streaming response
response = litellm.responses(
    model="gemini/gemini-1.5-flash",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

使用 OpenAI SDK 的 LiteLLM 代理

首先，设置并启动您的 LiteLLM 代理服务器。

启动 LiteLLM 代理服务器
litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

OpenAI
Anthropic
Vertex AI
AWS Bedrock
Google AI Studio

首先，将其添加到您的 litellm 代理 config.yaml 文件中

OpenAI 代理配置
model_list:
  - model_name: openai/o1-pro
    litellm_params:
      model: openai/o1-pro
      api_key: os.environ/OPENAI_API_KEY

非流式传输

OpenAI 代理非流式传输响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

流式传输

OpenAI 代理流式传输响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

GET 响应

使用 OpenAI SDK 按 ID 获取响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# First, create a response
response = client.responses.create(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn."
)

# Get the response ID
response_id = response.id

# Retrieve the response by ID
retrieved_response = client.responses.retrieve(response_id)

print(retrieved_response)

DELETE 响应

使用 OpenAI SDK 按 ID 删除响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# First, create a response
response = client.responses.create(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn."
)

# Get the response ID
response_id = response.id

# Delete the response by ID
delete_response = client.responses.delete(response_id)

print(delete_response)

首先，将其添加到您的 litellm 代理 config.yaml 文件中

Anthropic 代理配置
model_list:
  - model_name: anthropic/claude-3-5-sonnet-20240620
    litellm_params:
      model: anthropic/claude-3-5-sonnet-20240620
      api_key: os.environ/ANTHROPIC_API_KEY

非流式传输

Anthropic 代理非流式传输响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="anthropic/claude-3-5-sonnet-20240620",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

流式传输

Anthropic 代理流式传输响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="anthropic/claude-3-5-sonnet-20240620",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

首先，将其添加到您的 litellm 代理 config.yaml 文件中

Vertex AI 代理配置
model_list:
  - model_name: vertex_ai/gemini-1.5-pro
    litellm_params:
      model: vertex_ai/gemini-1.5-pro
      vertex_project: your-gcp-project-id
      vertex_location: us-central1

非流式传输

Vertex AI 代理非流式传输响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="vertex_ai/gemini-1.5-pro",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

流式传输

Vertex AI 代理流式传输响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="vertex_ai/gemini-1.5-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

首先，将其添加到您的 litellm 代理 config.yaml 文件中

AWS Bedrock 代理配置
model_list:
  - model_name: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
    litellm_params:
      model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-west-2

非流式传输

AWS Bedrock 代理非流式传输响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

流式传输

AWS Bedrock 代理流式传输响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

首先，将其添加到您的 litellm 代理 config.yaml 文件中

Google AI Studio 代理配置
model_list:
  - model_name: gemini/gemini-1.5-flash
    litellm_params:
      model: gemini/gemini-1.5-flash
      api_key: os.environ/GEMINI_API_KEY

非流式传输

Google AI Studio 代理非流式传输响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="gemini/gemini-1.5-flash",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

流式传输

Google AI Studio 代理流式传输响应
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="gemini/gemini-1.5-flash",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

支持的 Responses API 参数

提供商	支持的参数
`openai`	支持所有 Responses API 参数
`azure`	支持所有 Responses API 参数
`anthropic`	在此查看支持的参数
`bedrock`	在此查看支持的参数
`gemini`	在此查看支持的参数
`vertex_ai`	在此查看支持的参数
`azure_ai`	在此查看支持的参数
所有其他 llm api 提供商	在此查看支持的参数

具有会话连续性的负载均衡。

当对同一模型的多个部署（例如，多个 Azure OpenAI 端点）使用 Responses API 时，LiteLLM 提供会话连续性。这确保使用 previous_response_id 的后续请求被路由到生成原始响应的同一部署。

用法示例

Python SDK
代理服务器

具有会话连续性的 Python SDK
import litellm

# Set up router with multiple deployments of the same model
router = litellm.Router(
    model_list=[
        {
            "model_name": "azure-gpt4-turbo",
            "litellm_params": {
                "model": "azure/gpt-4-turbo",
                "api_key": "your-api-key-1",
                "api_version": "2024-06-01",
                "api_base": "https://endpoint1.openai.azure.com",
            },
        },
        {
            "model_name": "azure-gpt4-turbo",
            "litellm_params": {
                "model": "azure/gpt-4-turbo",
                "api_key": "your-api-key-2",
                "api_version": "2024-06-01",
                "api_base": "https://endpoint2.openai.azure.com",
            },
        },
    ],
    optional_pre_call_checks=["responses_api_deployment_check"],
)

# Initial request
response = await router.aresponses(
    model="azure-gpt4-turbo",
    input="Hello, who are you?",
    truncation="auto",
)

# Store the response ID
response_id = response.id

# Follow-up request - will be automatically routed to the same deployment
follow_up = await router.aresponses(
    model="azure-gpt4-turbo",
    input="Tell me more about yourself",
    truncation="auto",
    previous_response_id=response_id  # This ensures routing to the same deployment
)

1. 在 proxy config.yaml 中设置会话连续性

要在 LiteLLM 代理中为 Responses API 启用会话连续性，请在 proxy config.yaml 中设置 optional_pre_call_checks: ["responses_api_deployment_check"]。

具有会话连续性的 config.yaml
model_list:
  - model_name: azure-gpt4-turbo
    litellm_params:
      model: azure/gpt-4-turbo
      api_key: your-api-key-1
      api_version: 2024-06-01
      api_base: https://endpoint1.openai.azure.com
  - model_name: azure-gpt4-turbo
    litellm_params:
      model: azure/gpt-4-turbo
      api_key: your-api-key-2
      api_version: 2024-06-01
      api_base: https://endpoint2.openai.azure.com

router_settings:
  optional_pre_call_checks: ["responses_api_deployment_check"]

2. 使用 OpenAI Python SDK 向 LiteLLM 代理发出请求

使用代理服务器的 OpenAI Client
from openai import OpenAI

client = OpenAI(
    base_url="https://:4000",
    api_key="your-api-key"
)

# Initial request
response = client.responses.create(
    model="azure-gpt4-turbo",
    input="Hello, who are you?"
)

response_id = response.id

# Follow-up request - will be automatically routed to the same deployment
follow_up = client.responses.create(
    model="azure-gpt4-turbo",
    input="Tell me more about yourself",
    previous_response_id=response_id  # This ensures routing to the same deployment
)

会话管理 - 非 OpenAI 模型

LiteLLM 代理支持非 OpenAI 模型的会话管理。这允许您在 LiteLLM 代理中存储和获取对话历史记录（状态）。

用法

启用在数据库中存储请求/响应内容

在 proxy config.yaml 中设置 store_prompts_in_spend_logs: true。启用此项后，LiteLLM 将在数据库中存储请求和响应内容。

general_settings:
  store_prompts_in_spend_logs: true

发出请求 1，不带 previous_response_id（新会话）

通过不指定先前的响应 ID 来开始新的对话。

Curl
OpenAI Python SDK

curl https://:4000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet-latest",
    "input": "who is Michael Jordan"
  }'

from openai import OpenAI

# Initialize the client with your LiteLLM proxy URL
client = OpenAI(
    base_url="https://:4000",
    api_key="sk-1234"
)

# Make initial request to start a new conversation
response = client.responses.create(
    model="anthropic/claude-3-5-sonnet-latest",
    input="who is Michael Jordan"
)

print(response.id)  # Store this ID for future requests in same session
print(response.output[0].content[0].text)

响应

{
  "id":"resp_123abc",
  "model":"claude-3-5-sonnet-20241022",
  "output":[{
    "type":"message",
    "content":[{
      "type":"output_text",
      "text":"Michael Jordan is widely considered one of the greatest basketball players of all time. He played for the Chicago Bulls (1984-1993, 1995-1998) and Washington Wizards (2001-2003), winning 6 NBA Championships with the Bulls."
    }]
  }]
}

发出请求 2，带 previous_response_id（同一会话）

通过引用先前的响应 ID 来继续对话，以维持对话上下文。

Curl
OpenAI Python SDK

curl https://:4000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet-latest",
    "input": "can you tell me more about him",
    "previous_response_id": "resp_123abc"
  }'

from openai import OpenAI

# Initialize the client with your LiteLLM proxy URL
client = OpenAI(
    base_url="https://:4000",
    api_key="sk-1234"
)

# Make follow-up request in the same conversation session
follow_up_response = client.responses.create(
    model="anthropic/claude-3-5-sonnet-latest",
    input="can you tell me more about him",
    previous_response_id="resp_123abc"  # ID from the previous response
)

print(follow_up_response.output[0].content[0].text)

响应

{
  "id":"resp_456def",
  "model":"claude-3-5-sonnet-20241022",
  "output":[{
    "type":"message",
    "content":[{
      "type":"output_text",
      "text":"Michael Jordan was born February 17, 1963. He attended University of North Carolina before being drafted 3rd overall by the Bulls in 1984. Beyond basketball, he built the Air Jordan brand with Nike and later became owner of the Charlotte Hornets."
    }]
  }]
}

发出请求 3，不带 previous_response_id（新会话）

开始一个全新的对话，不引用先前的上下文，以演示上下文在会话之间是如何不保留的。

Curl
OpenAI Python SDK

curl https://:4000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet-latest",
    "input": "can you tell me more about him"
  }'

from openai import OpenAI

# Initialize the client with your LiteLLM proxy URL
client = OpenAI(
    base_url="https://:4000",
    api_key="sk-1234"
)

# Make a new request without previous context
new_session_response = client.responses.create(
    model="anthropic/claude-3-5-sonnet-latest",
    input="can you tell me more about him"
    # No previous_response_id means this starts a new conversation
)

print(new_session_response.output[0].content[0].text)

响应

{
  "id":"resp_789ghi",
  "model":"claude-3-5-sonnet-20241022",
  "output":[{
    "type":"message",
    "content":[{
      "type":"output_text",
      "text":"I don't see who you're referring to in our conversation. Could you let me know which person you'd like to learn more about?"
    }]
  }]
}

/responses[Beta]

用法​

LiteLLM Python SDK​

非流式传输​

流式传输​

GET 响应​

DELETE 响应​

非流式传输​

流式传输​

非流式传输​

流式传输​

非流式传输​

流式传输​

非流式传输​

流式传输​

使用 OpenAI SDK 的 LiteLLM 代理​

非流式传输​

流式传输​

GET 响应​

DELETE 响应​

非流式传输​

流式传输​

非流式传输​

流式传输​

非流式传输​

流式传输​

非流式传输​

流式传输​

支持的 Responses API 参数​

具有会话连续性的负载均衡。​

用法示例​

1. 在 proxy config.yaml 中设置会话连续性​

2. 使用 OpenAI Python SDK 向 LiteLLM 代理发出请求​

会话管理 - 非 OpenAI 模型​

用法​

用法

LiteLLM Python SDK

非流式传输

流式传输

GET 响应

DELETE 响应

非流式传输

流式传输

非流式传输

流式传输

非流式传输

流式传输

非流式传输

流式传输

使用 OpenAI SDK 的 LiteLLM 代理

非流式传输

流式传输

GET 响应

DELETE 响应

非流式传输

流式传输

非流式传输

流式传输

非流式传输

流式传输

非流式传输

流式传输

支持的 Responses API 参数

具有会话连续性的负载均衡。

用法示例

1. 在 proxy config.yaml 中设置会话连续性

2. 使用 OpenAI Python SDK 向 LiteLLM 代理发出请求

会话管理 - 非 OpenAI 模型

用法