回退

如果在经过 num_retries 次后调用失败，则回退到另一个模型组。

快速开始负载均衡
快速开始客户端回退

回退通常是从一个 model_name 回退到另一个 model_name。

快速开始

1. 设置回退

关键更改

fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}]

SDK
代理

from litellm import Router 
router = Router(
    model_list=[
    {
      "model_name": "gpt-3.5-turbo",
      "litellm_params": {
        "model": "azure/<your-deployment-name>",
        "api_base": "<your-azure-endpoint>",
        "api_key": "<your-azure-api-key>",
        "rpm": 6
      }
    },
    {
      "model_name": "gpt-4",
      "litellm_params": {
        "model": "azure/gpt-4-ca",
        "api_base": "https://my-endpoint-canada-berri992.openai.azure.com/",
        "api_key": "<your-azure-api-key>",
        "rpm": 6
      }
    }
    ],
    fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}] # 👈 KEY CHANGE
)

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/<your-deployment-name>
      api_base: <your-azure-endpoint>
      api_key: <your-azure-api-key>
      rpm: 6      # Rate limit for this deployment: in requests per minute (rpm)
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4-ca
      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
      api_key: <your-azure-api-key>
      rpm: 6

router_settings:
  fallbacks: [{"gpt-3.5-turbo": ["gpt-4"]}]

2. 启动代理

litellm --config /path/to/config.yaml

3. 测试回退

在请求体中传递 mock_testing_fallbacks=true 以触发回退。

SDK
代理

from litellm import Router

model_list = [{..}, {..}] # defined in Step 1.

router = Router(model_list=model_list, fallbacks=[{"bad-model": ["my-good-model"]}])

response = router.completion(
    model="bad-model",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
    mock_testing_fallbacks=True,
)

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "my-bad-model",
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "mock_testing_fallbacks": true # 👈 KEY CHANGE
}
'

解释

回退按顺序进行 -["gpt-3.5-turbo", "gpt-4", "gpt-4-32k"]，将首先尝试 'gpt-3.5-turbo'，然后是 'gpt-4' 等。

您还可以设置default_fallbacks，以防某个特定的模型组配置错误或出现问题。

回退有 3 种类型

content_policy_fallbacks: 用于 litellm.ContentPolicyViolationError - LiteLLM 映射了不同提供商的内容策略违规错误 查看代码
context_window_fallbacks: 用于 litellm.ContextWindowExceededErrors - LiteLLM 映射了不同提供商的上下文窗口错误消息 查看代码
fallbacks: 用于所有其他错误 - 例如 litellm.RateLimitError

客户端回退

在 SDK 的 .completion() 调用中设置回退，或在代理的客户端设置。

此请求中将发生以下情况

对 model="zephyr-beta" 的请求将失败
litellm 代理将遍历 fallbacks=["gpt-3.5-turbo"] 中指定的所有模型组
对 model="gpt-3.5-turbo" 的请求将成功，发出请求的客户端将从 gpt-3.5-turbo 获得响应

👉 关键更改："fallbacks": ["gpt-3.5-turbo"]

SDK
代理

from litellm import Router

router = Router(model_list=[..]) # defined in Step 1.

resp = router.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
    mock_testing_fallbacks=True, # 👈 trigger fallbacks
    fallbacks=[
        {
            "model": "claude-3-haiku",
            "messages": [{"role": "user", "content": "What is LiteLLM?"}],
        }
    ],
)

print(resp)

OpenAI Python v1.0.0+
Curl 请求
Langchain

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="zephyr-beta",
    messages = [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    extra_body={
        "fallbacks": ["gpt-3.5-turbo"]
    }
)

print(response)

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "zephyr-beta"",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ],
    "fallbacks": ["gpt-3.5-turbo"]
}'

from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
import os 

os.environ["OPENAI_API_KEY"] = "anything"

chat = ChatOpenAI(
    openai_api_base="http://0.0.0.0:4000",
    model="zephyr-beta",
    extra_body={
        "fallbacks": ["gpt-3.5-turbo"]
    }
)

messages = [
    SystemMessage(
        content="You are a helpful assistant that im using to make a test request to."
    ),
    HumanMessage(
        content="test from litellm. tell me why it's amazing in 1 sentence"
    ),
]
response = chat(messages)

print(response)

控制回退提示

在回退中为每个模型传递消息/温度等（也适用于嵌入/图像生成等）。

关键更改

fallbacks = [
  {
    "model": <model_name>,
    "messages": <model-specific-messages>
    ... # any other model-specific parameters
  }
]

SDK
代理

from litellm import Router

router = Router(model_list=[..]) # defined in Step 1.

resp = router.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
    mock_testing_fallbacks=True, # 👈 trigger fallbacks
    fallbacks=[
        {
            "model": "claude-3-haiku",
            "messages": [{"role": "user", "content": "What is LiteLLM?"}],
        }
    ],
)

print(resp)

OpenAI Python v1.0.0+
Curl 请求
Langchain

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="zephyr-beta",
    messages = [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    extra_body={
      "fallbacks": [{
          "model": "claude-3-haiku",
          "messages": [{"role": "user", "content": "What is LiteLLM?"}]
      }]
    }
)

print(response)

curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Hi, how are you ?"
          }
        ]
      }
    ],
    "fallbacks": [{
        "model": "claude-3-haiku",
        "messages": [{"role": "user", "content": "What is LiteLLM?"}]
    }],
    "mock_testing_fallbacks": true
}'

from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
import os 

os.environ["OPENAI_API_KEY"] = "anything"

chat = ChatOpenAI(
    openai_api_base="http://0.0.0.0:4000",
    model="zephyr-beta",
    extra_body={
      "fallbacks": [{
          "model": "claude-3-haiku",
          "messages": [{"role": "user", "content": "What is LiteLLM?"}]
      }]
    }
)

messages = [
    SystemMessage(
        content="You are a helpful assistant that im using to make a test request to."
    ),
    HumanMessage(
        content="test from litellm. tell me why it's amazing in 1 sentence"
    ),
]
response = chat(messages)

print(response)

内容策略违规回退

关键更改

content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}]

SDK
代理

from litellm import Router 

router = Router(
    model_list=[
        {
            "model_name": "claude-2",
            "litellm_params": {
                "model": "claude-2",
                "api_key": "",
                "mock_response": Exception("content filtering policy"),
            },
        },
        {
            "model_name": "my-fallback-model",
            "litellm_params": {
                "model": "claude-2",
                "api_key": "",
                "mock_response": "This works!",
            },
        },
    ],
    content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}], # 👈 KEY CHANGE
    # fallbacks=[..], # [OPTIONAL]
    # context_window_fallbacks=[..], # [OPTIONAL]
)

response = router.completion(
    model="claude-2",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
)

在您的 proxy config.yaml 中只需添加这一行 👇

router_settings:
    content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}]

启动代理

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

上下文窗口超出回退

关键更改

context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}]

SDK
代理

from litellm import Router 

router = Router(
    model_list=[
        {
            "model_name": "claude-2",
            "litellm_params": {
                "model": "claude-2",
                "api_key": "",
                "mock_response": Exception("prompt is too long"),
            },
        },
        {
            "model_name": "my-fallback-model",
            "litellm_params": {
                "model": "claude-2",
                "api_key": "",
                "mock_response": "This works!",
            },
        },
    ],
    context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}], # 👈 KEY CHANGE
    # fallbacks=[..], # [OPTIONAL]
    # content_policy_fallbacks=[..], # [OPTIONAL]
)

response = router.completion(
    model="claude-2",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
)

在您的 proxy config.yaml 中只需添加这一行 👇

router_settings:
    context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}]

启动代理

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

高级

回退 + 重试 + 超时 + 冷却时间

要设置回退，只需这样做

litellm_settings:
  fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}]

涵盖所有错误（429、500 等）

通过配置设置

model_list:
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8001
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8002
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8003
  - model_name: gpt-3.5-turbo
    litellm_params:
        model: gpt-3.5-turbo
        api_key: <my-openai-key>
  - model_name: gpt-3.5-turbo-16k
    litellm_params:
        model: gpt-3.5-turbo-16k
        api_key: <my-openai-key>

litellm_settings:
  num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
  request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout 
  fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}] # fallback to gpt-3.5-turbo if call fails num_retries 
  allowed_fails: 3 # cooldown model if it fails > 1 call in a minute. 
  cooldown_time: 30 # how long to cooldown model if fails/min > allowed_fails

回退到特定的模型 ID

如果模型组中的所有模型都处于冷却期（例如，达到速率限制），LiteLLM 将回退到具有特定模型 ID 的模型。

这会跳过回退模型的任何冷却检查。

在 model_info 中指定模型 ID

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
    model_info:
      id: my-specific-model-id # 👈 KEY CHANGE
  - model_name: gpt-4
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
  - model_name: anthropic-claude
    litellm_params:
      model: anthropic/claude-3-opus-20240229
      api_key: os.environ/ANTHROPIC_API_KEY

注意：这将仅回退到具有特定模型 ID 的模型。如果您想回退到另一个模型组，可以设置 fallbacks=[{"gpt-4": ["anthropic-claude"]}]

在配置中设置回退

litellm_settings:
  fallbacks: [{"gpt-4": ["my-specific-model-id"]}]

测试它！

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "mock_testing_fallbacks": true
}'

通过检查响应头 x-litellm-model-id 验证它是否工作

x-litellm-model-id: my-specific-model-id

测试回退！

检查您的回退是否按预期工作。

常规回退

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "my-bad-model",
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "mock_testing_fallbacks": true # 👈 KEY CHANGE
}
'

内容策略回退

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "my-bad-model",
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "mock_testing_content_policy_fallbacks": true # 👈 KEY CHANGE
}
'

上下文窗口回退

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "my-bad-model",
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "mock_testing_context_window_fallbacks": true # 👈 KEY CHANGE
}
'

上下文窗口回退（预调用检查 + 回退）

在进行调用之前，使用 enable_pre_call_checks: true 检查调用是否在模型的上下文窗口内。

查看代码

1. 设置配置

对于 Azure 部署，设置基础模型。从此列表中选择基础模型，所有 Azure 模型都以 azure/ 开头。

同一组
上下文窗口回退（不同组）

过滤掉具有较小上下文窗口的旧版模型实例（例如 gpt-3.5-turbo）

router_settings:
    enable_pre_call_checks: true # 1. Enable pre-call checks

model_list:
    - model_name: gpt-3.5-turbo
      litellm_params:
        model: azure/chatgpt-v-2
        api_base: os.environ/AZURE_API_BASE
        api_key: os.environ/AZURE_API_KEY
        api_version: "2023-07-01-preview"
      model_info:
        base_model: azure/gpt-4-1106-preview # 2. 👈 (azure-only) SET BASE MODEL
    
    - model_name: gpt-3.5-turbo
      litellm_params:
        model: gpt-3.5-turbo-1106
        api_key: os.environ/OPENAI_API_KEY

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. 测试它！

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

text = "What is the meaning of 42?" * 5000

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages = [
        {"role": "system", "content": text},
        {"role": "user", "content": "Who was Alexander?"},
    ],
)

print(response)

如果当前模型太小，则回退到更大的模型。

router_settings:
    enable_pre_call_checks: true # 1. Enable pre-call checks

model_list:
    - model_name: gpt-3.5-turbo-small
      litellm_params:
        model: azure/chatgpt-v-2
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
      model_info:
      base_model: azure/gpt-4-1106-preview # 2. 👈 (azure-only) SET BASE MODEL
    
    - model_name: gpt-3.5-turbo-large
      litellm_params:
      model: gpt-3.5-turbo-1106
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-opus
    litellm_params:
      model: claude-3-opus-20240229
      api_key: os.environ/ANTHROPIC_API_KEY

litellm_settings:
  context_window_fallbacks: [{"gpt-3.5-turbo-small": ["gpt-3.5-turbo-large", "claude-opus"]}]

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. 测试它！

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

text = "What is the meaning of 42?" * 5000

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages = [
        {"role": "system", "content": text},
        {"role": "user", "content": "Who was Alexander?"},
    ],
)

print(response)

内容策略回退

如果您遇到内容策略违规错误，可以在不同提供商之间回退（例如从 Azure OpenAI 回退到 Anthropic）。

model_list:
    - model_name: gpt-3.5-turbo-small
      litellm_params:
        model: azure/chatgpt-v-2
        api_base: os.environ/AZURE_API_BASE
        api_key: os.environ/AZURE_API_KEY
        api_version: "2023-07-01-preview"

    - model_name: claude-opus
      litellm_params:
        model: claude-3-opus-20240229
        api_key: os.environ/ANTHROPIC_API_KEY

litellm_settings:
  content_policy_fallbacks: [{"gpt-3.5-turbo-small": ["claude-opus"]}]

默认回退

您还可以设置 default_fallbacks，以防某个特定的模型组配置错误或出现问题。

model_list:
    - model_name: gpt-3.5-turbo-small
      litellm_params:
        model: azure/chatgpt-v-2
        api_base: os.environ/AZURE_API_BASE
        api_key: os.environ/AZURE_API_KEY
        api_version: "2023-07-01-preview"

    - model_name: claude-opus
      litellm_params:
        model: claude-3-opus-20240229
        api_key: os.environ/ANTHROPIC_API_KEY

litellm_settings:
  default_fallbacks: ["claude-opus"]

如果任何模型失败，这将默认回退到 claude-opus。

特定模型的回退（例如 {"gpt-3.5-turbo-small"["claude-opus"]}) 会覆盖默认回退。

欧盟区域过滤（预调用检查）

在进行调用之前，使用 enable_pre_call_checks: true 检查调用是否在模型的上下文窗口内。

设置部署的 'region_name'。

注意：LiteLLM 可以根据您的 litellm 参数自动推断 Vertex AI、Bedrock 和 IBM WatsonxAI 的 region_name。对于 Azure，设置 litellm.enable_preview = True。

1. 设置配置

router_settings:
    enable_pre_call_checks: true # 1. Enable pre-call checks

model_list:
- model_name: gpt-3.5-turbo
  litellm_params:
    model: azure/chatgpt-v-2
    api_base: os.environ/AZURE_API_BASE
    api_key: os.environ/AZURE_API_KEY
    api_version: "2023-07-01-preview"
    region_name: "eu" # 👈 SET EU-REGION

- model_name: gpt-3.5-turbo
  litellm_params:
    model: gpt-3.5-turbo-1106
    api_key: os.environ/OPENAI_API_KEY

- model_name: gemini-pro
  litellm_params:
    model: vertex_ai/gemini-pro-1.5
    vertex_project: adroit-crow-1234
    vertex_location: us-east1 # 👈 AUTOMATICALLY INFERS 'region_name'

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. 测试它！

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.with_raw_response.create(
    model="gpt-3.5-turbo",
    messages = [{"role": "user", "content": "Who was Alexander?"}]
)

print(response)

print(f"response.headers.get('x-litellm-model-api-base')")

为通配符模型设置回退

您可以在配置文件中为通配符模型（例如 azure/*）设置回退。

设置配置

model_list:
  - model_name: "gpt-4o"
    litellm_params:
      model: "openai/gpt-4o"
      api_key: os.environ/OPENAI_API_KEY
  - model_name: "azure/*"
    litellm_params:
      model: "azure/*"
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE

litellm_settings:
  fallbacks: [{"gpt-4o": ["azure/gpt-4o"]}]

启动代理

litellm --config /path/to/config.yaml

测试它！

curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [    
          {
            "type": "text",
            "text": "what color is red"
          }
        ]
      }
    ],
    "max_tokens": 300,
    "mock_testing_fallbacks": true
}'

禁用回退（按请求/按密钥）

按请求
按密钥

您可以通过在请求体中设置 disable_fallbacks: true 来禁用按请求回退。

curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "messages": [
        {
            "role": "user",
            "content": "List 5 important events in the XIX century"
        }
    ],
    "model": "gpt-3.5-turbo",
    "disable_fallbacks": true # 👈 DISABLE FALLBACKS
}'

您可以通过在密钥元数据中设置 disable_fallbacks: true 来禁用按密钥回退。

curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
    "metadata": {
        "disable_fallbacks": true
    }
}'

回退

快速开始​

1. 设置回退​

2. 启动代理​

3. 测试回退​

解释​

客户端回退​

控制回退提示​

内容策略违规回退​

上下文窗口超出回退​

高级​

回退 + 重试 + 超时 + 冷却时间​

回退到特定的模型 ID​

测试回退！​

常规回退​

内容策略回退​

上下文窗口回退​

上下文窗口回退（预调用检查 + 回退）​

内容策略回退​

默认回退​

欧盟区域过滤（预调用检查）​

为通配符模型设置回退​

禁用回退（按请求/按密钥）​

快速开始

1. 设置回退

2. 启动代理

3. 测试回退

解释

客户端回退

控制回退提示

内容策略违规回退

上下文窗口超出回退

高级

回退 + 重试 + 超时 + 冷却时间

回退到特定的模型 ID

测试回退！

常规回退

内容策略回退

上下文窗口回退

上下文窗口回退（预调用检查 + 回退）

内容策略回退

默认回退

欧盟区域过滤（预调用检查）

为通配符模型设置回退

禁用回退（按请求/按密钥）