回退
如果在经过 num_retries 次后调用失败,则回退到另一个模型组。
回退通常是从一个 model_name
回退到另一个 model_name
。
快速开始
1. 设置回退
关键更改
fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}]
- SDK
- 代理
from litellm import Router
router = Router(
model_list=[
{
"model_name": "gpt-3.5-turbo",
"litellm_params": {
"model": "azure/<your-deployment-name>",
"api_base": "<your-azure-endpoint>",
"api_key": "<your-azure-api-key>",
"rpm": 6
}
},
{
"model_name": "gpt-4",
"litellm_params": {
"model": "azure/gpt-4-ca",
"api_base": "https://my-endpoint-canada-berri992.openai.azure.com/",
"api_key": "<your-azure-api-key>",
"rpm": 6
}
}
],
fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}] # 👈 KEY CHANGE
)
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/<your-deployment-name>
api_base: <your-azure-endpoint>
api_key: <your-azure-api-key>
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
- model_name: gpt-4
litellm_params:
model: azure/gpt-4-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: <your-azure-api-key>
rpm: 6
router_settings:
fallbacks: [{"gpt-3.5-turbo": ["gpt-4"]}]
2. 启动代理
litellm --config /path/to/config.yaml
3. 测试回退
在请求体中传递 mock_testing_fallbacks=true
以触发回退。
- SDK
- 代理
from litellm import Router
model_list = [{..}, {..}] # defined in Step 1.
router = Router(model_list=model_list, fallbacks=[{"bad-model": ["my-good-model"]}])
response = router.completion(
model="bad-model",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
mock_testing_fallbacks=True,
)
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "my-bad-model",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_fallbacks": true # 👈 KEY CHANGE
}
'
解释
回退按顺序进行 -["gpt-3.5-turbo", "gpt-4", "gpt-4-32k"],将首先尝试 'gpt-3.5-turbo',然后是 'gpt-4' 等。
您还可以设置default_fallbacks
,以防某个特定的模型组配置错误或出现问题。
回退有 3 种类型
content_policy_fallbacks
: 用于 litellm.ContentPolicyViolationError - LiteLLM 映射了不同提供商的内容策略违规错误 查看代码context_window_fallbacks
: 用于 litellm.ContextWindowExceededErrors - LiteLLM 映射了不同提供商的上下文窗口错误消息 查看代码fallbacks
: 用于所有其他错误 - 例如 litellm.RateLimitError
客户端回退
在 SDK 的 .completion()
调用中设置回退,或在代理的客户端设置。
此请求中将发生以下情况
- 对
model="zephyr-beta"
的请求将失败 - litellm 代理将遍历
fallbacks=["gpt-3.5-turbo"]
中指定的所有模型组 - 对
model="gpt-3.5-turbo"
的请求将成功,发出请求的客户端将从 gpt-3.5-turbo 获得响应
👉 关键更改:"fallbacks": ["gpt-3.5-turbo"]
- SDK
- 代理
from litellm import Router
router = Router(model_list=[..]) # defined in Step 1.
resp = router.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
mock_testing_fallbacks=True, # 👈 trigger fallbacks
fallbacks=[
{
"model": "claude-3-haiku",
"messages": [{"role": "user", "content": "What is LiteLLM?"}],
}
],
)
print(resp)
- OpenAI Python v1.0.0+
- Curl 请求
- Langchain
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="zephyr-beta",
messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
extra_body={
"fallbacks": ["gpt-3.5-turbo"]
}
)
print(response)
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "zephyr-beta"",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
"fallbacks": ["gpt-3.5-turbo"]
}'
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
import os
os.environ["OPENAI_API_KEY"] = "anything"
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000",
model="zephyr-beta",
extra_body={
"fallbacks": ["gpt-3.5-turbo"]
}
)
messages = [
SystemMessage(
content="You are a helpful assistant that im using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)
print(response)
控制回退提示
在回退中为每个模型传递消息/温度等(也适用于嵌入/图像生成等)。
关键更改
fallbacks = [
{
"model": <model_name>,
"messages": <model-specific-messages>
... # any other model-specific parameters
}
]
- SDK
- 代理
from litellm import Router
router = Router(model_list=[..]) # defined in Step 1.
resp = router.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
mock_testing_fallbacks=True, # 👈 trigger fallbacks
fallbacks=[
{
"model": "claude-3-haiku",
"messages": [{"role": "user", "content": "What is LiteLLM?"}],
}
],
)
print(resp)
- OpenAI Python v1.0.0+
- Curl 请求
- Langchain
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="zephyr-beta",
messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
extra_body={
"fallbacks": [{
"model": "claude-3-haiku",
"messages": [{"role": "user", "content": "What is LiteLLM?"}]
}]
}
)
print(response)
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Hi, how are you ?"
}
]
}
],
"fallbacks": [{
"model": "claude-3-haiku",
"messages": [{"role": "user", "content": "What is LiteLLM?"}]
}],
"mock_testing_fallbacks": true
}'
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
import os
os.environ["OPENAI_API_KEY"] = "anything"
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000",
model="zephyr-beta",
extra_body={
"fallbacks": [{
"model": "claude-3-haiku",
"messages": [{"role": "user", "content": "What is LiteLLM?"}]
}]
}
)
messages = [
SystemMessage(
content="You are a helpful assistant that im using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)
print(response)
内容策略违规回退
关键更改
content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}]
- SDK
- 代理
from litellm import Router
router = Router(
model_list=[
{
"model_name": "claude-2",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": Exception("content filtering policy"),
},
},
{
"model_name": "my-fallback-model",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": "This works!",
},
},
],
content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}], # 👈 KEY CHANGE
# fallbacks=[..], # [OPTIONAL]
# context_window_fallbacks=[..], # [OPTIONAL]
)
response = router.completion(
model="claude-2",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)
在您的 proxy config.yaml 中只需添加这一行 👇
router_settings:
content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}]
启动代理
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
上下文窗口超出回退
关键更改
context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}]
- SDK
- 代理
from litellm import Router
router = Router(
model_list=[
{
"model_name": "claude-2",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": Exception("prompt is too long"),
},
},
{
"model_name": "my-fallback-model",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": "This works!",
},
},
],
context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}], # 👈 KEY CHANGE
# fallbacks=[..], # [OPTIONAL]
# content_policy_fallbacks=[..], # [OPTIONAL]
)
response = router.completion(
model="claude-2",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)
在您的 proxy config.yaml 中只需添加这一行 👇
router_settings:
context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}]
启动代理
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
高级
回退 + 重试 + 超时 + 冷却时间
要设置回退,只需这样做
litellm_settings:
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}]
涵盖所有错误(429、500 等)
通过配置设置
model_list:
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8001
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8002
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8003
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: <my-openai-key>
- model_name: gpt-3.5-turbo-16k
litellm_params:
model: gpt-3.5-turbo-16k
api_key: <my-openai-key>
litellm_settings:
num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}] # fallback to gpt-3.5-turbo if call fails num_retries
allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.
cooldown_time: 30 # how long to cooldown model if fails/min > allowed_fails
回退到特定的模型 ID
如果模型组中的所有模型都处于冷却期(例如,达到速率限制),LiteLLM 将回退到具有特定模型 ID 的模型。
这会跳过回退模型的任何冷却检查。
- 在
model_info
中指定模型 ID
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
model_info:
id: my-specific-model-id # 👈 KEY CHANGE
- model_name: gpt-4
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
- model_name: anthropic-claude
litellm_params:
model: anthropic/claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY
注意:这将仅回退到具有特定模型 ID 的模型。如果您想回退到另一个模型组,可以设置 fallbacks=[{"gpt-4": ["anthropic-claude"]}]
- 在配置中设置回退
litellm_settings:
fallbacks: [{"gpt-4": ["my-specific-model-id"]}]
- 测试它!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_fallbacks": true
}'
通过检查响应头 x-litellm-model-id
验证它是否工作
x-litellm-model-id: my-specific-model-id
测试回退!
检查您的回退是否按预期工作。
常规回退
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "my-bad-model",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_fallbacks": true # 👈 KEY CHANGE
}
'
内容策略回退
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "my-bad-model",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_content_policy_fallbacks": true # 👈 KEY CHANGE
}
'
上下文窗口回退
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "my-bad-model",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_context_window_fallbacks": true # 👈 KEY CHANGE
}
'
上下文窗口回退(预调用检查 + 回退)
在进行调用之前,使用 enable_pre_call_checks: true
检查调用是否在模型的上下文窗口内。
1. 设置配置
对于 Azure 部署,设置基础模型。从此列表中选择基础模型,所有 Azure 模型都以 azure/ 开头。
- 同一组
- 上下文窗口回退(不同组)
过滤掉具有较小上下文窗口的旧版模型实例(例如 gpt-3.5-turbo)
router_settings:
enable_pre_call_checks: true # 1. Enable pre-call checks
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
model_info:
base_model: azure/gpt-4-1106-preview # 2. 👈 (azure-only) SET BASE MODEL
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo-1106
api_key: os.environ/OPENAI_API_KEY
2. 启动代理
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
3. 测试它!
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
text = "What is the meaning of 42?" * 5000
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{"role": "system", "content": text},
{"role": "user", "content": "Who was Alexander?"},
],
)
print(response)
如果当前模型太小,则回退到更大的模型。
router_settings:
enable_pre_call_checks: true # 1. Enable pre-call checks
model_list:
- model_name: gpt-3.5-turbo-small
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
model_info:
base_model: azure/gpt-4-1106-preview # 2. 👈 (azure-only) SET BASE MODEL
- model_name: gpt-3.5-turbo-large
litellm_params:
model: gpt-3.5-turbo-1106
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-opus
litellm_params:
model: claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY
litellm_settings:
context_window_fallbacks: [{"gpt-3.5-turbo-small": ["gpt-3.5-turbo-large", "claude-opus"]}]
2. 启动代理
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
3. 测试它!
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
text = "What is the meaning of 42?" * 5000
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{"role": "system", "content": text},
{"role": "user", "content": "Who was Alexander?"},
],
)
print(response)
内容策略回退
如果您遇到内容策略违规错误,可以在不同提供商之间回退(例如从 Azure OpenAI 回退到 Anthropic)。
model_list:
- model_name: gpt-3.5-turbo-small
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
- model_name: claude-opus
litellm_params:
model: claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY
litellm_settings:
content_policy_fallbacks: [{"gpt-3.5-turbo-small": ["claude-opus"]}]
默认回退
您还可以设置 default_fallbacks,以防某个特定的模型组配置错误或出现问题。
model_list:
- model_name: gpt-3.5-turbo-small
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
- model_name: claude-opus
litellm_params:
model: claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY
litellm_settings:
default_fallbacks: ["claude-opus"]
如果任何模型失败,这将默认回退到 claude-opus。
特定模型的回退(例如 {"gpt-3.5-turbo-small"["claude-opus"]}) 会覆盖默认回退。
欧盟区域过滤(预调用检查)
在进行调用之前,使用 enable_pre_call_checks: true
检查调用是否在模型的上下文窗口内。
设置部署的 'region_name'。
注意:LiteLLM 可以根据您的 litellm 参数自动推断 Vertex AI、Bedrock 和 IBM WatsonxAI 的 region_name。对于 Azure,设置 litellm.enable_preview = True
。
1. 设置配置
router_settings:
enable_pre_call_checks: true # 1. Enable pre-call checks
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
region_name: "eu" # 👈 SET EU-REGION
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo-1106
api_key: os.environ/OPENAI_API_KEY
- model_name: gemini-pro
litellm_params:
model: vertex_ai/gemini-pro-1.5
vertex_project: adroit-crow-1234
vertex_location: us-east1 # 👈 AUTOMATICALLY INFERS 'region_name'
2. 启动代理
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
3. 测试它!
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.with_raw_response.create(
model="gpt-3.5-turbo",
messages = [{"role": "user", "content": "Who was Alexander?"}]
)
print(response)
print(f"response.headers.get('x-litellm-model-api-base')")
为通配符模型设置回退
您可以在配置文件中为通配符模型(例如 azure/*
)设置回退。
- 设置配置
model_list:
- model_name: "gpt-4o"
litellm_params:
model: "openai/gpt-4o"
api_key: os.environ/OPENAI_API_KEY
- model_name: "azure/*"
litellm_params:
model: "azure/*"
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE
litellm_settings:
fallbacks: [{"gpt-4o": ["azure/gpt-4o"]}]
- 启动代理
litellm --config /path/to/config.yaml
- 测试它!
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "what color is red"
}
]
}
],
"max_tokens": 300,
"mock_testing_fallbacks": true
}'
禁用回退(按请求/按密钥)
- 按请求
- 按密钥
您可以通过在请求体中设置 disable_fallbacks: true
来禁用按请求回退。
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"messages": [
{
"role": "user",
"content": "List 5 important events in the XIX century"
}
],
"model": "gpt-3.5-turbo",
"disable_fallbacks": true # 👈 DISABLE FALLBACKS
}'
您可以通过在密钥元数据中设置 disable_fallbacks: true
来禁用按密钥回退。
curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"metadata": {
"disable_fallbacks": true
}
}'