概览

在 config.yaml 中设置模型列表、api_base、api_key、temperature 和代理服务器设置 (master-key)。

参数名称	描述
`model_list`	服务器上支持的模型列表，包含模型特定配置
`router_settings`	litellm 路由设置，例如 `routing_strategy="least-busy"` 查看全部
`litellm_settings`	litellm 模块设置，例如 `litellm.drop_params=True`、`litellm.set_verbose=True`、`litellm.api_base`、`litellm.cache` 查看全部
`general_settings`	服务器设置，例如设置 `master_key: sk-my_special_key`
`environment_variables`	环境变量示例，`REDIS_HOST`、`REDIS_PORT`

完整列表： 请查看 Swagger UI 文档 <your-proxy-url>/#/config.yaml（例如 http://0.0.0.0:4000/#/config.yaml），了解您可以在 config.yaml 中传递的所有内容。

快速入门

为您的部署设置模型别名。

在 config.yaml 中，model_name 参数是用于您的部署的用户可见名称。

在以下配置中

model_name：从外部客户端传递给 litellm 的名称
litellm_params.model：传递给 litellm.completion() 函数的模型字符串

例如：

model=vllm-models 将路由到 openai/facebook/opt-125m。
model=gpt-3.5-turbo 将在 azure/gpt-turbo-small-eu 和 azure/gpt-turbo-small-ca 之间进行负载均衡

model_list:
  - model_name: gpt-3.5-turbo ### RECEIVED MODEL NAME ###
    litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.com.cn/docs/completion/input
      model: azure/gpt-turbo-small-eu ### MODEL NAME sent to `litellm.completion()` ###
      api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
      api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
      rpm: 6      # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
  - model_name: bedrock-claude-v1 
    litellm_params:
      model: bedrock/anthropic.claude-instant-v1
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-ca
      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
      api_key: "os.environ/AZURE_API_KEY_CA"
      rpm: 6
  - model_name: anthropic-claude
    litellm_params: 
      model: bedrock/anthropic.claude-instant-v1
      ### [OPTIONAL] SET AWS REGION ###
      aws_region_name: us-east-1
  - model_name: vllm-models
    litellm_params:
      model: openai/facebook/opt-125m # the `openai/` prefix tells litellm it's openai compatible
      api_base: http://0.0.0.0:4000/v1
      api_key: none
      rpm: 1440
    model_info: 
      version: 2
  
  # Use this if you want to make requests to `claude-3-haiku-20240307`,`claude-3-opus-20240229`,`claude-2.1` without defining them on the config.yaml
  # Default models
  # Works for ALL Providers and needs the default provider credentials in .env
  - model_name: "*" 
    litellm_params:
      model: "*"

litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
  drop_params: True
  success_callback: ["langfuse"] # OPTIONAL - if you want to start sending LLM Logs to Langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your env

general_settings: 
  master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
  alerting: ["slack"] # [OPTIONAL] If you want Slack Alerts for Hanging LLM requests, Slow llm responses, Budget Alerts. Make sure to set `SLACK_WEBHOOK_URL` in your env

信息

有关更多提供商特定信息，请点击此处。

步骤 2：使用配置启动代理

$ litellm --config /path/to/config.yaml

提示

如果您需要详细的调试日志，请使用 --detailed_debug 运行。

$ litellm --config /path/to/config.yaml --detailed_debug

步骤 3：测试

发送请求到 config.yaml 中 model_name=gpt-3.5-turbo 的模型。

如果存在多个 model_name=gpt-3.5-turbo 的模型，将进行负载均衡

Langchain, OpenAI SDK 使用示例

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-3.5-turbo",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

LLM 配置 `model_list`

模型特定参数 (API Base, Keys, Temperature, Max Tokens, Organization, Headers 等)

您可以使用配置文件保存模型特定信息，例如 api_base, api_key, temperature, max_tokens 等。

所有输入参数

步骤 1：创建 config.yaml 文件

model_list:
  - model_name: gpt-4-team1
    litellm_params: # params for litellm.completion() - https://docs.litellm.com.cn/docs/completion/input#input---request-body
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      azure_ad_token: eyJ0eXAiOiJ
      seed: 12
      max_tokens: 20
  - model_name: gpt-4-team2
    litellm_params:
      model: azure/gpt-4
      api_key: sk-123
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
      temperature: 0.2
  - model_name: openai-gpt-3.5
    litellm_params:
      model: openai/gpt-3.5-turbo
      extra_headers: {"AI-Resource Group": "ishaan-resource"}
      api_key: sk-123
      organization: org-ikDc4ex8NB
      temperature: 0.2
  - model_name: mistral-7b
    litellm_params:
      model: ollama/mistral
      api_base: your_ollama_api_base

步骤 2：使用配置启动服务器

$ litellm --config /path/to/config.yaml

预期日志

在您的控制台日志中查找此行，确认 config.yaml 已正确加载。

LiteLLM: Proxy initialized with Config, Set models:

Embedding 模型 - 使用 Sagemaker, Bedrock, Azure, OpenAI, XInference

查看支持的 Embedding 提供商与模型，请点击此处

model_list:
  - model_name: bedrock-cohere
    litellm_params:
      model: "bedrock/cohere.command-text-v14"
      aws_region_name: "us-west-2"
  - model_name: bedrock-cohere
    litellm_params:
      model: "bedrock/cohere.command-text-v14"
      aws_region_name: "us-east-2"
  - model_name: bedrock-cohere
    litellm_params:
      model: "bedrock/cohere.command-text-v14"
      aws_region_name: "us-east-1"

以下是如何在代理服务器上路由 GPT-J embedding (Sagemaker 端点), Amazon Titan embedding (Bedrock) 和 Azure OpenAI embedding 模型。

model_list:
  - model_name: sagemaker-embeddings
    litellm_params: 
      model: "sagemaker/berri-benchmarking-gpt-j-6b-fp16"
  - model_name: amazon-embeddings
    litellm_params:
      model: "bedrock/amazon.titan-embed-text-v1"
  - model_name: azure-embeddings
    litellm_params: 
      model: "azure/azure-embedding-model"
      api_base: "os.environ/AZURE_API_BASE" # os.getenv("AZURE_API_BASE")
      api_key: "os.environ/AZURE_API_KEY" # os.getenv("AZURE_API_KEY")
      api_version: "2023-07-01-preview"

general_settings:
  master_key: sk-1234 # [OPTIONAL] if set all calls to proxy will require either this key or a valid generated token

LiteLLM 代理支持所有特征提取 Embedding 模型。

model_list:
  - model_name: deployed-codebert-base
    litellm_params: 
      # send request to deployed hugging face inference endpoint
      model: huggingface/microsoft/codebert-base # add huggingface prefix so it routes to hugging face
      api_key: hf_LdS                            # api key for hugging face inference endpoint
      api_base: https://uysneno1wv2wd4lw.us-east-1.aws.endpoints.huggingface.cloud # your hf inference endpoint 
  - model_name: codebert-base
    litellm_params: 
      # no api_base set, sends request to hugging face free inference api https://api-inference.huggingface.co/models/
      model: huggingface/microsoft/codebert-base # add huggingface prefix so it routes to hugging face
      api_key: hf_LdS                            # api key for hugging face                     

model_list:
  - model_name: azure-embedding-model # model group
    litellm_params:
      model: azure/azure-embedding-model # model name for litellm.embedding(model=azure/azure-embedding-model) call
      api_base: your-azure-api-base
      api_key: your-api-key
      api_version: 2023-07-01-preview

model_list:
- model_name: text-embedding-ada-002 # model group
  litellm_params:
    model: text-embedding-ada-002 # model name for litellm.embedding(model=text-embedding-ada-002) 
    api_key: your-api-key-1
- model_name: text-embedding-ada-002 
  litellm_params:
    model: text-embedding-ada-002
    api_key: your-api-key-2

https://docs.litellm.com.cn/docs/providers/xinference

注意，在 litellm_params: model 中添加 xinference/ 前缀，以便 litellm 知道路由到 OpenAI

model_list:
- model_name: embedding-model  # model group
  litellm_params:
    model: xinference/bge-base-en   # model name for litellm.embedding(model=xinference/bge-base-en) 
    api_base: http://0.0.0.0:9997/v1

使用此方法调用OpenAI 兼容服务器上的 /embedding 端点。

注意，在 litellm_params: model 中添加 openai/ 前缀，以便 litellm 知道路由到 OpenAI

model_list:
- model_name: text-embedding-ada-002  # model group
  litellm_params:
    model: openai/<your-model-name>   # model name for litellm.embedding(model=text-embedding-ada-002) 
    api_base: <model-api-base>

启动代理

litellm --config config.yaml

发送请求

发送请求到 bedrock-cohere

curl --location 'http://0.0.0.0:4000/chat/completions' \
  --header 'Content-Type: application/json' \
  --data ' {
  "model": "bedrock-cohere",
  "messages": [
      {
      "role": "user",
      "content": "gm"
      }
  ]
}'

多个 OpenAI 组织

只需一个模型定义，即可添加所有 OpenAI 组织的所有 openai 模型

  - model_name: *
    litellm_params:
      model: openai/*
      api_key: os.environ/OPENAI_API_KEY
      organization:
       - org-1 
       - org-2 
       - org-3

LiteLLM 将自动为每个组织创建单独的部署。

通过以下方式确认：

curl --location 'http://0.0.0.0:4000/v1/model/info' \
--header 'Authorization: Bearer ${LITELLM_KEY}' \
--data ''

负载均衡

信息

有关更多信息，请访问此页面

使用此方法调用同一模型的多个实例，并配置例如路由策略等设置。

为了获得最佳性能

为每个模型部署设置 tpm/rpm。加权选择将基于已设置的 tpm/rpm 进行。
在 router_settings:routing_strategy 中选择您的最佳路由策略。

LiteLLM 支持

["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"`

当设置了 tpm/rpm 且 routing_strategy==simple-shuffle 时，litellm 将根据设置的 tpm/rpm 进行加权选择。在我们的负载测试中，为所有部署设置 tpm/rpm 并使用 routing_strategy==simple-shuffle 最大化了吞吐量

使用多个 LiteLLM 服务器 / Kubernetes 时，请设置 redis 配置，例如 router_settings:redis_host 等

model_list:
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8001
        rpm: 60      # Optional[int]: When rpm/tpm set - litellm uses weighted pick for load balancing. rpm = Rate limit for this deployment: in requests per minute (rpm).
        tpm: 1000   # Optional[int]: tpm = Tokens Per Minute 
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8002
        rpm: 600      
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8003
        rpm: 60000      
  - model_name: gpt-3.5-turbo
    litellm_params:
        model: gpt-3.5-turbo
        api_key: <my-openai-key>
        rpm: 200      
  - model_name: gpt-3.5-turbo-16k
    litellm_params:
        model: gpt-3.5-turbo-16k
        api_key: <my-openai-key>
        rpm: 100      

litellm_settings:
  num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
  request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout 
  fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}] # fallback to gpt-3.5-turbo if call fails num_retries 
  context_window_fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo-16k"]}, {"gpt-3.5-turbo": ["gpt-3.5-turbo-16k"]}] # fallback to gpt-3.5-turbo-16k if context window error
  allowed_fails: 3 # cooldown model if it fails > 1 call in a minute. 

router_settings: # router_settings are optional
  routing_strategy: simple-shuffle # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"
  model_group_alias: {"gpt-4": "gpt-3.5-turbo"} # all requests with `gpt-4` will be routed to models with `gpt-3.5-turbo`
  num_retries: 2
  timeout: 30                                  # 30 seconds
  redis_host: <your redis host>                # set this when using multiple litellm proxy deployments, load balancing state stored in redis
  redis_password: <your redis password>
  redis_port: 1992

设置虚拟密钥或自定义回调后，您可以查看您的费用

从环境变量加载 API 密钥 / 配置值

如果您的秘密信息保存在环境变量中，并且不想在 config.yaml 中暴露它们，这里是如何从环境变量中加载模型特定密钥的方法。这适用于 config.yaml 中的任何值

os.environ/<YOUR-ENV-VAR> # runs os.getenv("YOUR-ENV-VAR")

model_list:
  - model_name: gpt-4-team1
    litellm_params: # params for litellm.completion() - https://docs.litellm.com.cn/docs/completion/input#input---request-body
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: os.environ/AZURE_NORTH_AMERICA_API_KEY # 👈 KEY CHANGE

查看代码

鸣谢@David Manouchehri 提供帮助。

集中式凭据管理

一次定义凭据并在多个模型中重复使用。这有助于：

秘密轮换
减少配置重复

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o
      litellm_credential_name: default_azure_credential  # Reference credential below

credential_list:
  - credential_name: default_azure_credential
    credential_values:
      api_key: os.environ/AZURE_API_KEY  # Load from environment
      api_base: os.environ/AZURE_API_BASE
      api_version: "2023-05-15"
    credential_info:
      description: "Production credentials for EU region"

关键参数

credential_name：凭据集的唯一标识符
credential_values：凭据/秘密信息的键值对（支持 os.environ/ 语法）
credential_info：用户提供的凭据信息的键值对。不需要任何键值对，但字典必须存在。

从秘密管理器加载 API 密钥 (Azure Vault 等)

将秘密管理器与 LiteLLM 代理一起使用

为模型设置支持的环境 - `production`、`staging`、`development`

如果您想控制在特定 litellm 环境中暴露哪个模型，请使用此方法。

支持的环境

production
staging
development

在您的环境中设置 LITELLM_ENVIRONMENT="<environment>"。可以是 production、staging 或 development 中的一个

为每个模型在 model_info.supported_environments 中设置支持的环境列表。

model_list:
 - model_name: gpt-3.5-turbo
   litellm_params:
     model: openai/gpt-3.5-turbo
     api_key: os.environ/OPENAI_API_KEY
   model_info:
     supported_environments: ["development", "production", "staging"]
 - model_name: gpt-4
   litellm_params:
     model: openai/gpt-4
     api_key: os.environ/OPENAI_API_KEY
   model_info:
     supported_environments: ["production", "staging"]
 - model_name: gpt-4o
   litellm_params:
     model: openai/gpt-4o
     api_key: os.environ/OPENAI_API_KEY
   model_info:
     supported_environments: ["production"]

设置自定义 Prompt 模板

LiteLLM 默认检查模型是否具有prompt 模板并应用它（例如，如果 huggingface 模型在其 tokenizer_config.json 中保存了 chat 模板）。但是，您也可以在代理的 config.yaml 中设置自定义 prompt 模板。

步骤 1：将您的 prompt 模板保存在 config.yaml 中

# Model-specific parameters
model_list:
  - model_name: mistral-7b # model alias
    litellm_params: # actual params for litellm.completion()
      model: "huggingface/mistralai/Mistral-7B-Instruct-v0.1" 
      api_base: "<your-api-base>"
      api_key: "<your-api-key>" # [OPTIONAL] for hf inference endpoints
      initial_prompt_value: "\n"
      roles: {"system":{"pre_message":"<|im_start|>system\n", "post_message":"<|im_end|>"}, "assistant":{"pre_message":"<|im_start|>assistant\n","post_message":"<|im_end|>"}, "user":{"pre_message":"<|im_start|>user\n","post_message":"<|im_end|>"}}
      final_prompt_value: "\n"
      bos_token: " "
      eos_token: " "
      max_tokens: 4096

步骤 2：使用配置启动服务器

$ litellm --config /path/to/config.yaml

设置自定义分词器

如果您正在使用/utils/token_counter 端点，并且想为模型设置自定义 huggingface 分词器，您可以在 config.yaml 中进行设置。

model_list:
  - model_name: openai-deepseek
    litellm_params:
      model: deepseek/deepseek-chat
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      access_groups: ["restricted-models"]
      custom_tokenizer: 
        identifier: deepseek-ai/DeepSeek-V3-Base
        revision: main
        auth_token: os.environ/HUGGINGFACE_API_KEY

规范

custom_tokenizer: 
  identifier: str # huggingface model identifier
  revision: str # huggingface model revision (usually 'main')
  auth_token: Optional[str] # huggingface auth token 

通用设置 `general_settings` (数据库连接等)

配置数据库连接池限制 + 连接超时

general_settings: 
  database_connection_pool_limit: 100 # sets connection pool for prisma client to postgres db at 100
  database_connection_timeout: 60 # sets a 60s timeout for any connection call to the db 

附加功能

禁用 Swagger UI

要从基础 URL 禁用 Swagger 文档，请设置：

NO_DOCS="True"

在您的环境中，并重启代理。

使用 CONFIG_FILE_PATH 进行代理 (更轻松的 Azure 容器部署)

设置 config.yaml

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

将文件路径存储为环境变量

CONFIG_FILE_PATH="/path/to/config.yaml"

启动代理

$ litellm 

# RUNNING on http://0.0.0.0:4000

将 LiteLLM config.yaml 文件作为 s3, GCS 存储桶对象/URL 提供

如果您无法在部署服务上挂载配置文件（例如 - AWS Fargate, Railway 等），请使用此方法

LiteLLM 代理将从 s3 存储桶或 GCS 存储桶读取您的 config.yaml

GCS 存储桶
s3

设置以下 .env 变量

LITELLM_CONFIG_BUCKET_TYPE = "gcs"                              # set this to "gcs"         
LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy"                    # your bucket name on GCS
LITELLM_CONFIG_BUCKET_OBJECT_KEY = "proxy_config.yaml"         # object key on GCS

使用这些环境变量启动 litellm 代理 - litellm 将从 GCS 读取您的配置

docker run --name litellm-proxy \
   -e DATABASE_URL=<database_url> \
   -e LITELLM_CONFIG_BUCKET_NAME=<bucket_name> \
   -e LITELLM_CONFIG_BUCKET_OBJECT_KEY="<object_key>> \
   -e LITELLM_CONFIG_BUCKET_TYPE="gcs" \
   -p 4000:4000 \
   ghcr.io/berriai/litellm-database:main-latest --detailed_debug

设置以下 .env 变量

LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy"                    # your bucket name on s3 
LITELLM_CONFIG_BUCKET_OBJECT_KEY = "litellm_proxy_config.yaml"  # object key on s3

使用这些环境变量启动 litellm 代理 - litellm 将从 s3 读取您的配置

docker run --name litellm-proxy \
   -e DATABASE_URL=<database_url> \
   -e LITELLM_CONFIG_BUCKET_NAME=<bucket_name> \
   -e LITELLM_CONFIG_BUCKET_OBJECT_KEY="<object_key>> \
   -p 4000:4000 \
   ghcr.io/berriai/litellm-database:main-latest

概览

快速入门​

步骤 2：使用配置启动代理​

步骤 3：测试​

LLM 配置 model_list​

模型特定参数 (API Base, Keys, Temperature, Max Tokens, Organization, Headers 等)​

Embedding 模型 - 使用 Sagemaker, Bedrock, Azure, OpenAI, XInference​

启动代理​

发送请求​

多个 OpenAI 组织​

负载均衡​

从环境变量加载 API 密钥 / 配置值​

集中式凭据管理​

关键参数​

从秘密管理器加载 API 密钥 (Azure Vault 等)​

为模型设置支持的环境 - production、staging、development​

设置自定义 Prompt 模板​

设置自定义分词器​

通用设置 general_settings (数据库连接等)​

配置数据库连接池限制 + 连接超时​

附加功能​

禁用 Swagger UI​

使用 CONFIG_FILE_PATH 进行代理 (更轻松的 Azure 容器部署)​

将 LiteLLM config.yaml 文件作为 s3, GCS 存储桶对象/URL 提供​