跳到主要内容

回退

如果在经过 num_retries 次后调用失败,则回退到另一个模型组。

回退通常是从一个 model_name 回退到另一个 model_name

快速开始

1. 设置回退

关键更改

fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}]
from litellm import Router 
router = Router(
model_list=[
{
"model_name": "gpt-3.5-turbo",
"litellm_params": {
"model": "azure/<your-deployment-name>",
"api_base": "<your-azure-endpoint>",
"api_key": "<your-azure-api-key>",
"rpm": 6
}
},
{
"model_name": "gpt-4",
"litellm_params": {
"model": "azure/gpt-4-ca",
"api_base": "https://my-endpoint-canada-berri992.openai.azure.com/",
"api_key": "<your-azure-api-key>",
"rpm": 6
}
}
],
fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}] # 👈 KEY CHANGE
)

2. 启动代理

litellm --config /path/to/config.yaml

3. 测试回退

在请求体中传递 mock_testing_fallbacks=true 以触发回退。


from litellm import Router

model_list = [{..}, {..}] # defined in Step 1.

router = Router(model_list=model_list, fallbacks=[{"bad-model": ["my-good-model"]}])

response = router.completion(
model="bad-model",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
mock_testing_fallbacks=True,
)

解释

回退按顺序进行 -["gpt-3.5-turbo", "gpt-4", "gpt-4-32k"],将首先尝试 'gpt-3.5-turbo',然后是 'gpt-4' 等。

您还可以设置default_fallbacks,以防某个特定的模型组配置错误或出现问题。

回退有 3 种类型

  • content_policy_fallbacks: 用于 litellm.ContentPolicyViolationError - LiteLLM 映射了不同提供商的内容策略违规错误 查看代码
  • context_window_fallbacks: 用于 litellm.ContextWindowExceededErrors - LiteLLM 映射了不同提供商的上下文窗口错误消息 查看代码
  • fallbacks: 用于所有其他错误 - 例如 litellm.RateLimitError

客户端回退

在 SDK 的 .completion() 调用中设置回退,或在代理的客户端设置。

此请求中将发生以下情况

  1. model="zephyr-beta" 的请求将失败
  2. litellm 代理将遍历 fallbacks=["gpt-3.5-turbo"] 中指定的所有模型组
  3. model="gpt-3.5-turbo" 的请求将成功,发出请求的客户端将从 gpt-3.5-turbo 获得响应

👉 关键更改:"fallbacks": ["gpt-3.5-turbo"]

from litellm import Router

router = Router(model_list=[..]) # defined in Step 1.

resp = router.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
mock_testing_fallbacks=True, # 👈 trigger fallbacks
fallbacks=[
{
"model": "claude-3-haiku",
"messages": [{"role": "user", "content": "What is LiteLLM?"}],
}
],
)

print(resp)

控制回退提示

在回退中为每个模型传递消息/温度等(也适用于嵌入/图像生成等)。

关键更改

fallbacks = [
{
"model": <model_name>,
"messages": <model-specific-messages>
... # any other model-specific parameters
}
]
from litellm import Router

router = Router(model_list=[..]) # defined in Step 1.

resp = router.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
mock_testing_fallbacks=True, # 👈 trigger fallbacks
fallbacks=[
{
"model": "claude-3-haiku",
"messages": [{"role": "user", "content": "What is LiteLLM?"}],
}
],
)

print(resp)

内容策略违规回退

关键更改

content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}]
from litellm import Router 

router = Router(
model_list=[
{
"model_name": "claude-2",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": Exception("content filtering policy"),
},
},
{
"model_name": "my-fallback-model",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": "This works!",
},
},
],
content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}], # 👈 KEY CHANGE
# fallbacks=[..], # [OPTIONAL]
# context_window_fallbacks=[..], # [OPTIONAL]
)

response = router.completion(
model="claude-2",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)

上下文窗口超出回退

关键更改

context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}]
from litellm import Router 

router = Router(
model_list=[
{
"model_name": "claude-2",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": Exception("prompt is too long"),
},
},
{
"model_name": "my-fallback-model",
"litellm_params": {
"model": "claude-2",
"api_key": "",
"mock_response": "This works!",
},
},
],
context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}], # 👈 KEY CHANGE
# fallbacks=[..], # [OPTIONAL]
# content_policy_fallbacks=[..], # [OPTIONAL]
)

response = router.completion(
model="claude-2",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)

高级

回退 + 重试 + 超时 + 冷却时间

要设置回退,只需这样做

litellm_settings:
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}]

涵盖所有错误(429、500 等)

通过配置设置

model_list:
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8001
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8002
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8003
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: <my-openai-key>
- model_name: gpt-3.5-turbo-16k
litellm_params:
model: gpt-3.5-turbo-16k
api_key: <my-openai-key>

litellm_settings:
num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout
fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}] # fallback to gpt-3.5-turbo if call fails num_retries
allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.
cooldown_time: 30 # how long to cooldown model if fails/min > allowed_fails

回退到特定的模型 ID

如果模型组中的所有模型都处于冷却期(例如,达到速率限制),LiteLLM 将回退到具有特定模型 ID 的模型。

这会跳过回退模型的任何冷却检查。

  1. model_info 中指定模型 ID
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
model_info:
id: my-specific-model-id # 👈 KEY CHANGE
- model_name: gpt-4
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
- model_name: anthropic-claude
litellm_params:
model: anthropic/claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY

注意:这将仅回退到具有特定模型 ID 的模型。如果您想回退到另一个模型组,可以设置 fallbacks=[{"gpt-4": ["anthropic-claude"]}]

  1. 在配置中设置回退
litellm_settings:
fallbacks: [{"gpt-4": ["my-specific-model-id"]}]
  1. 测试它!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_fallbacks": true
}'

通过检查响应头 x-litellm-model-id 验证它是否工作

x-litellm-model-id: my-specific-model-id

测试回退!

检查您的回退是否按预期工作。

常规回退

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "my-bad-model",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_fallbacks": true # 👈 KEY CHANGE
}
'

内容策略回退

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "my-bad-model",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_content_policy_fallbacks": true # 👈 KEY CHANGE
}
'

上下文窗口回退

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "my-bad-model",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_context_window_fallbacks": true # 👈 KEY CHANGE
}
'

上下文窗口回退(预调用检查 + 回退)

在进行调用之前,使用 enable_pre_call_checks: true 检查调用是否在模型的上下文窗口内。

查看代码

1. 设置配置

对于 Azure 部署,设置基础模型。从此列表中选择基础模型,所有 Azure 模型都以 azure/ 开头。

过滤掉具有较小上下文窗口的旧版模型实例(例如 gpt-3.5-turbo)

router_settings:
enable_pre_call_checks: true # 1. Enable pre-call checks

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
model_info:
base_model: azure/gpt-4-1106-preview # 2. 👈 (azure-only) SET BASE MODEL

- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo-1106
api_key: os.environ/OPENAI_API_KEY

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. 测试它!

import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)

text = "What is the meaning of 42?" * 5000

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{"role": "system", "content": text},
{"role": "user", "content": "Who was Alexander?"},
],
)

print(response)

内容策略回退

如果您遇到内容策略违规错误,可以在不同提供商之间回退(例如从 Azure OpenAI 回退到 Anthropic)。

model_list:
- model_name: gpt-3.5-turbo-small
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"

- model_name: claude-opus
litellm_params:
model: claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY

litellm_settings:
content_policy_fallbacks: [{"gpt-3.5-turbo-small": ["claude-opus"]}]

默认回退

您还可以设置 default_fallbacks,以防某个特定的模型组配置错误或出现问题。

model_list:
- model_name: gpt-3.5-turbo-small
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"

- model_name: claude-opus
litellm_params:
model: claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY

litellm_settings:
default_fallbacks: ["claude-opus"]

如果任何模型失败,这将默认回退到 claude-opus。

特定模型的回退(例如 {"gpt-3.5-turbo-small"["claude-opus"]}) 会覆盖默认回退。

欧盟区域过滤(预调用检查)

在进行调用之前,使用 enable_pre_call_checks: true 检查调用是否在模型的上下文窗口内。

设置部署的 'region_name'。

注意:LiteLLM 可以根据您的 litellm 参数自动推断 Vertex AI、Bedrock 和 IBM WatsonxAI 的 region_name。对于 Azure,设置 litellm.enable_preview = True

1. 设置配置

router_settings:
enable_pre_call_checks: true # 1. Enable pre-call checks

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
region_name: "eu" # 👈 SET EU-REGION

- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo-1106
api_key: os.environ/OPENAI_API_KEY

- model_name: gemini-pro
litellm_params:
model: vertex_ai/gemini-pro-1.5
vertex_project: adroit-crow-1234
vertex_location: us-east1 # 👈 AUTOMATICALLY INFERS 'region_name'

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. 测试它!

import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.with_raw_response.create(
model="gpt-3.5-turbo",
messages = [{"role": "user", "content": "Who was Alexander?"}]
)

print(response)

print(f"response.headers.get('x-litellm-model-api-base')")

为通配符模型设置回退

您可以在配置文件中为通配符模型(例如 azure/*)设置回退。

  1. 设置配置
model_list:
- model_name: "gpt-4o"
litellm_params:
model: "openai/gpt-4o"
api_key: os.environ/OPENAI_API_KEY
- model_name: "azure/*"
litellm_params:
model: "azure/*"
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE

litellm_settings:
fallbacks: [{"gpt-4o": ["azure/gpt-4o"]}]
  1. 启动代理
litellm --config /path/to/config.yaml
  1. 测试它!
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "what color is red"
}
]
}
],
"max_tokens": 300,
"mock_testing_fallbacks": true
}'

禁用回退(按请求/按密钥)

您可以通过在请求体中设置 disable_fallbacks: true 来禁用按请求回退。

curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"messages": [
{
"role": "user",
"content": "List 5 important events in the XIX century"
}
],
"model": "gpt-3.5-turbo",
"disable_fallbacks": true # 👈 DISABLE FALLBACKS
}'