代理 - 负载均衡
负载均衡同一模型的多个实例
代理将处理路由请求(使用 LiteLLM 的路由器)。如果您想最大化吞吐量,请在配置中设置 rpm
信息
有关路由策略/参数的更多详细信息,请参阅 路由
快速入门 - 负载均衡
步骤 1 - 在配置中设置部署
下面的配置示例。这里,模型为 gpt-3.5-turbo
的请求将被路由到 azure/gpt-3.5-turbo
的多个实例上
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/<your-deployment-name>
api_base: <your-azure-endpoint>
api_key: <your-azure-api-key>
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: <your-azure-api-key>
rpm: 6
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-large
api_base: https://openai-france-1234.openai.azure.com/
api_key: <your-azure-api-key>
rpm: 1440
router_settings:
routing_strategy: simple-shuffle # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"
model_group_alias: {"gpt-4": "gpt-3.5-turbo"} # all requests with `gpt-4` will be routed to models with `gpt-3.5-turbo`
num_retries: 2
timeout: 30 # 30 seconds
redis_host: <your redis host> # set this when using multiple litellm proxy deployments, load balancing state stored in redis
redis_password: <your redis password>
redis_port: 1992
信息
步骤 2:使用配置启动代理
$ litellm --config /path/to/config.yaml
测试 - 简单调用
这里,模型为 gpt-3.5-turbo 的请求将被路由到 azure/gpt-3.5-turbo 的多个实例上
👉 关键变化:model="gpt-3.5-turbo"
检查响应头中的 model_id
以确保请求正在进行负载均衡
- OpenAI Python v1.0.0+
- Curl 请求
- Langchain
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
]
)
print(response)
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
import os
os.environ["OPENAI_API_KEY"] = "anything"
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000",
model="gpt-3.5-turbo",
)
messages = [
SystemMessage(
content="You are a helpful assistant that im using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)
print(response)
测试 - 负载均衡
在此请求中,将发生以下情况
- 将引发速率限制异常
- LiteLLM 代理将在模型组上重试请求(默认为 3 次)。
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "Hi there!"}
],
"mock_testing_rate_limit_error": true
}'
使用多个 LiteLLM 实例进行负载均衡(Kubernetes,自动扩缩容)
LiteLLM 代理支持在多个 LiteLLM 实例之间共享 rpm/tpm,传入 redis_host
、redis_password
和 redis_port
以启用此功能。(LiteLLM 将使用 Redis 跟踪 rpm/tpm 使用情况)
配置示例
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/<your-deployment-name>
api_base: <your-azure-endpoint>
api_key: <your-azure-api-key>
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: <your-azure-api-key>
rpm: 6
router_settings:
redis_host: <your redis host>
redis_password: <your redis password>
redis_port: 1992
配置中的路由器设置 - routing_strategy, model_group_alias
在代理服务器上为 'model_name' 暴露一个 '别名'。
model_group_alias: {
"gpt-4": "gpt-3.5-turbo"
}
这些别名默认显示在 /v1/models
、/v1/model/info
和 /v1/model_group/info
上。
litellm.Router() 设置可以在 router_settings
下配置。您可以设置 model_group_alias
、routing_strategy
、num_retries
、timeout
。查看所有路由器支持的参数 此处
用法
包含 router_settings
的配置示例
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/<your-deployment-name>
api_base: <your-azure-endpoint>
api_key: <your-azure-api-key>
router_settings:
model_group_alias: {"gpt-4": "gpt-3.5-turbo"} # all requests with `gpt-4` will be routed to models
隐藏别名模型
如果您想为以下内容设置别名,请使用此功能
- 输入错误
- 次要模型版本变更
- 更新之间的字母大小写敏感性变化
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/<your-deployment-name>
api_base: <your-azure-endpoint>
api_key: <your-azure-api-key>
router_settings:
model_group_alias:
"GPT-3.5-turbo": # alias
model: "gpt-3.5-turbo" # Actual model name in 'model_list'
hidden: true # Exclude from `/v1/models`, `/v1/model/info`, `/v1/model_group/info`
完整规范
model_group_alias: Optional[Dict[str, Union[str, RouterModelGroupAliasItem]]] = {}
class RouterModelGroupAliasItem(TypedDict):
model: str
hidden: bool # if 'True', don't return on `/v1/models`, `/v1/model/info`, `/v1/model_group/info`