💰 预算, 速率限制

要求

需要一个 postgres 数据库 (例如 Supabase, Neon 等) 查看设置

设置预算

全局代理

在代理上的所有调用中应用预算

步骤 1. 修改 config.yaml

general_settings:
  master_key: sk-1234

litellm_settings:
  # other litellm settings
  max_budget: 0 # (float) sets max budget as $0 USD
  budget_duration: 30d # (str) frequency of reset - You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

步骤 2. 启动代理

litellm /path/to/config.yaml

步骤 3. 发送测试调用

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Autherization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ],
}'

团队

您可以

为团队添加预算

信息

此处提供了关于设置、重置团队预算的分步教程 (API 或使用管理员界面)

👉 https://docs.litellm.com.cn/docs/proxy/team_budgets

为团队添加预算

curl --location 'https://:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_alias": "my-new-team_4",
  "members_with_roles": [{"role": "admin", "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"}],
  "rpm_limit": 99
}' 

查看 Swagger

响应示例

{
    "team_alias": "my-new-team_4",
    "team_id": "13e83b19-f851-43fe-8e93-f96e21033100",
    "admins": [],
    "members": [],
    "members_with_roles": [
        {
            "role": "admin",
            "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"
        }
    ],
    "metadata": {},
    "tpm_limit": null,
    "rpm_limit": 99,
    "max_budget": null,
    "models": [],
    "spend": 0.0,
    "max_parallel_requests": null,
    "budget_duration": null,
    "budget_reset_at": null
}

为团队添加预算持续时间

budget_duration: 预算在指定持续时间结束时重置。如果未设置，预算永远不会重置。您可以将持续时间设置为秒（"30s"）、分钟（"30m"）、小时（"30h"）、天（"30d"）。

curl 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_alias": "my-new-team_4",
  "members_with_roles": [{"role": "admin", "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"}],
  "budget_duration": 10s,
}'

团队成员

当您想为团队中的用户消费设置预算时使用此功能

步骤 1. 创建用户

创建一个 user_id=ishaan 的用户

curl --location 'http://0.0.0.0:4000/user/new' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
        "user_id": "ishaan"
}'

步骤 2. 将用户添加到现有团队 - 设置 `max_budget_in_team`

将用户添加到团队时设置 max_budget_in_team。我们使用在步骤 1 中设置的相同 user_id

curl -X POST 'http://0.0.0.0:4000/team/member_add' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"team_id": "e8d1460f-846c-45d7-9b43-55f3cc52ac32", "max_budget_in_team": 0.000000000001, "member": {"role": "user", "user_id": "ishaan"}}'

步骤 3. 为步骤 1 中的团队成员创建密钥

从步骤 1 设置 user_id=ishaan

curl --location 'http://0.0.0.0:4000/key/generate' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
        "user_id": "ishaan",
        "team_id": "e8d1460f-846c-45d7-9b43-55f3cc52ac32"
}'

来自 /key/generate 的响应

我们在步骤 4 中使用此响应中的 key

{"key":"sk-RV-l2BJEZ_LYNChSx2EueQ", "models":[],"spend":0.0,"max_budget":null,"user_id":"ishaan","team_id":"e8d1460f-846c-45d7-9b43-55f3cc52ac32","max_parallel_requests":null,"metadata":{},"tpm_limit":null,"rpm_limit":null,"budget_duration":null,"allowed_cache_controls":[],"soft_budget":null,"key_alias":null,"duration":null,"aliases":{},"config":{},"permissions":{},"model_max_budget":{},"key_name":null,"expires":null,"token_id":null}% 

步骤 4. 为团队成员发起 /chat/completions 请求

对此请求使用步骤 3 中的密钥。在发送 2-3 个请求后，预计会看到以下错误 ExceededBudget: Crossed spend within team

curl --location 'https://:4000/chat/completions' \
    --header 'Authorization: Bearer sk-RV-l2BJEZ_LYNChSx2EueQ' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "llama3",
    "messages": [
        {
        "role": "user",
        "content": "tes4"
        }
    ]
}'

内部用户

对内部用户（密钥所有者）可以在代理上发出的所有调用应用预算。

信息

对于大多数用例，我们建议设置团队成员预算

LiteLLM 暴露了 /user/new 端点来为此创建预算。

您可以

为用户添加预算跳转
添加预算持续时间，以重置消费跳转

默认情况下，max_budget 设置为 null 并且不对密钥进行检查

为用户添加预算

curl --location 'https://:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["azure-models"], "max_budget": 0, "user_id": "krrish3@berri.ai"}' 

查看 Swagger

响应示例

{
    "key": "sk-YF2OxDbrgd1y2KgwxmEA2w",
    "expires": "2023-12-22T09:53:13.861000Z",
    "user_id": "krrish3@berri.ai",
    "max_budget": 0.0
}

为用户添加预算持续时间

curl 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_id": "core-infra", # [OPTIONAL]
  "max_budget": 10,
  "budget_duration": 10s,
}'

为现有用户创建新密钥

现在您只需使用该 user_id (例如 krrish3@berri.ai) 调用 /key/generate，并且

预算检查: 将检查此密钥是否超出 krrish3@berri.ai 的预算 (例如 $10)
消费跟踪: 此密钥的消费也将更新 krrish3@berri.ai 的消费记录

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "user_id": "krrish3@berri.ai"}'

虚拟密钥

对密钥应用预算。

您可以

为密钥添加预算跳转
添加预算持续时间，以重置消费跳转

预期行为

每密钥成本会自动填充到 LiteLLM_VerificationToken 表中
密钥超出 max_budget 后，请求将失败
如果设置了持续时间，消费将在持续时间结束时重置

默认情况下，max_budget 设置为 null 并且不对密钥进行检查

为密钥添加预算

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_id": "core-infra", # [OPTIONAL]
  "max_budget": 10,
}'

密钥超出预算时向 /chat/completions 发送的请求示例

curl --location 'http://0.0.0.0:4000/chat/completions' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <generated-key>' \
  --data ' {
  "model": "azure-gpt-3.5",
  "user": "e09b4da8-ed80-4b05-ac93-e16d9eb56fca",
  "messages": [
      {
      "role": "user",
      "content": "respond in 50 lines"
      }
  ],
}'

密钥超出预算时来自 /chat/completions 的预期响应

{
  "detail":"Authentication Error, ExceededTokenBudget: Current spend for token: 7.2e-05; Max Budget for Token: 2e-07"
}   

为密钥添加预算持续时间

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_id": "core-infra", # [OPTIONAL]
  "max_budget": 10,
  "budget_duration": 10s,
}'

✨ 虚拟密钥（按模型指定）

对密钥应用模型特定预算。示例

对于 key = "sk-12345"，gpt-4o 在 1d 时间段内的预算为 $0.0000001
对于 key = "sk-12345"，gpt-4o-mini 在 30d 时间段内的预算为 $10

信息

✨ 这是企业版独有的功能在此处开始使用企业版

model_max_budget 的规范是 Dict[str, GenericBudgetInfo]

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model_max_budget": {"gpt-4o": {"budget_limit": "0.0000001", "time_period": "1d"}}
}'

发送测试请求

我们预计第一个请求会成功，第二个请求会失败，因为我们在虚拟密钥上超出了 gpt-4o 的预算

Langchain, OpenAI SDK 使用示例

成功调用
调用失败

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <sk-generated-key>' \
--data ' {
      "model": "gpt-4o",
      "messages": [
        {
          "role": "user",
          "content": "testing request"
        }
      ]
    }
'

预计此调用会失败，因为我们在虚拟密钥上超出了 model=gpt-4o 的预算

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <sk-generated-key>' \
--data ' {
      "model": "gpt-4o",
      "messages": [
        {
          "role": "user",
          "content": "testing request"
        }
      ]
    }
'

调用失败时的预期响应

{
    "error": {
        "message": "LiteLLM Virtual Key: 9769f3f6768a199f76cc29xxxx, key_alias: None, exceeded budget for model=gpt-4o",
        "type": "budget_exceeded",
        "param": null,
        "code": "400"
    }
}

客户

使用此功能来为传递给 /chat/completions 的 user 设置预算，而无需为每个用户创建密钥

步骤 1. 修改 config.yaml 定义 litellm.max_end_user_budget

general_settings:
  master_key: sk-1234

litellm_settings:
  max_end_user_budget: 0.0001 # budget for 'user' passed to /chat/completions

发起 /chat/completions 调用，传递 'user' - 第一次调用成功

curl --location 'http://0.0.0.0:4000/chat/completions' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
        --data ' {
        "model": "azure-gpt-3.5",
        "user": "ishaan3",
        "messages": [
            {
            "role": "user",
            "content": "what time is it"
            }
        ]
        }'

发起 /chat/completions 调用，传递 'user' - 调用失败，因为 'ishaan3' 超出预算

curl --location 'http://0.0.0.0:4000/chat/completions' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
        --data ' {
        "model": "azure-gpt-3.5",
        "user": "ishaan3",
        "messages": [
            {
            "role": "user",
            "content": "what time is it"
            }
        ]
        }'

错误

{"error":{"message":"Budget has been exceeded: User ishaan3 has exceeded their budget. Current spend: 0.0008869999999999999; Max Budget: 0.0001","type":"auth_error","param":"None","code":401}}%                

重置预算

重置密钥/内部用户/团队/客户的预算

内部用户
密钥
团队

curl 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "max_budget": 10,
  "budget_duration": 10s, # 👈 KEY CHANGE
}'

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "max_budget": 10,
  "budget_duration": 10s, # 👈 KEY CHANGE
}'

curl 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "max_budget": 10,
  "budget_duration": 10s, # 👈 KEY CHANGE
}'

注意: 默认情况下，服务器每 10 分钟检查一次重置，以最大程度地减少数据库调用。

要更改此设置，请设置 proxy_budget_rescheduler_min_time 和 proxy_budget_rescheduler_max_time

例如：每 1 秒检查一次

general_settings: 
  proxy_budget_rescheduler_min_time: 1
  proxy_budget_rescheduler_max_time: 1

设置速率限制

您可以设置

tpm 限制（每分钟令牌数）
rpm 限制（每分钟请求数）
最大并行请求数
特定密钥的每模型 rpm / tpm 限制

按团队
按内部用户
按密钥
按 API 密钥和模型
针对客户

使用 /team/new 或 /team/update，为团队的多个密钥设置持久速率限制。

curl --location 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"team_id": "my-prod-team", "max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' 

查看 Swagger

预期响应

{
    "key": "sk-sA7VDkyhlQ7m8Gt77Mbt3Q",
    "expires": "2024-01-19T01:21:12.816168",
    "team_id": "my-prod-team",
}

使用 /user/new 或 /user/update，为内部用户的多个密钥设置持久速率限制。

curl --location 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"user_id": "krrish@berri.ai", "max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' 

查看 Swagger

预期响应

{
    "key": "sk-sA7VDkyhlQ7m8Gt77Mbt3Q",
    "expires": "2024-01-19T01:21:12.816168",
    "user_id": "krrish@berri.ai",
}

如果您只想为该密钥设置速率限制，请使用 /key/generate。

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' 

预期响应

{
    "key": "sk-ulGNRXWtv7M0lFnnsQk0wQ",
    "expires": "2024-01-18T20:48:44.297973",
    "user_id": "78c2c8fc-c233-43b9-b0c3-eb931da27b84"  // 👈 auto-generated
}

设置每个 API 密钥的每模型速率限制

设置 model_rpm_limit 和 model_tpm_limit 来设置每个 API 密钥的每模型速率限制

此处的 gpt-4 是在 litellm config.yaml 中设置的 model_name

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"model_rpm_limit": {"gpt-4": 2}, "model_tpm_limit": {"gpt-4":}}' 

预期响应

{
    "key": "sk-ulGNRXWtv7M0lFnnsQk0wQ",
    "expires": "2024-01-18T20:48:44.297973",
}

验证此密钥的模型速率限制是否设置正确

发起 /chat/completions 请求，检查是否返回了 x-litellm-key-remaining-requests-gpt-4

curl -i https://:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-ulGNRXWtv7M0lFnnsQk0wQ" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello, Claude!ss eho ares"}
    ]
  }'

预期头部

x-litellm-key-remaining-requests-gpt-4: 1
x-litellm-key-remaining-tokens-gpt-4: 179

这些头部表明

密钥 sk-ulGNRXWtv7M0lFnnsQk0wQ 的 GPT-4 模型剩余 1 个请求
密钥 sk-ulGNRXWtv7M0lFnnsQk0wQ 的 GPT-4 模型剩余 179 个令牌

信息

您还可以在界面上的“速率限制”选项卡下为客户创建预算 ID。

使用此功能来为传递给 /chat/completions 的 user 设置速率限制，而无需为每个用户创建密钥

步骤 1. 创建预算

在预算上设置 tpm_limit（如果需要，您也可以传递 rpm_limit）

curl --location 'http://0.0.0.0:4000/budget/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
    "budget_id" : "free-tier",
    "tpm_limit": 5
}'

步骤 2. 创建带有预算的 `Customer`

我们在创建这些新客户时使用步骤 1 中的 budget_id="free-tier"

curl --location 'http://0.0.0.0:4000/customer/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
    "user_id" : "palantir",
    "budget_id": "free-tier"
}'

步骤 3. 在 `/chat/completions` 请求中传递 `user_id`

将步骤 2 中的 user_id 作为 user="palantir" 传递

curl --location 'https://:4000/chat/completions' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "llama3",
    "user": "palantir",
    "messages": [
        {
        "role": "user",
        "content": "gm"
        }
    ]
}'

为所有内部用户设置默认预算

使用此功能为分配了密钥的用户设置默认预算。

当用户具有 user_role="internal_user" 时此设置将生效（通过 /user/new 或 /user/update 设置）。

如果密钥具有 team_id（此时将应用团队预算），此设置将无效。告诉我们如何改进！

在 config.yaml 中定义最大预算

model_list: 
  - model_name: "gpt-3.5-turbo"
    litellm_params:
      model: gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

litellm_settings:
  max_internal_user_budget: 0 # amount in USD
  internal_user_budget_duration: "1mo" # reset every month

为用户创建密钥

curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{}'

预期响应

{
  ...
  "key": "sk-X53RdxnDhzamRwjKXR4IHg"
}

测试一下！

curl -L -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-X53RdxnDhzamRwjKXR4IHg' \
-d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hey, how's it going?"}]
}'

预期响应

{
    "error": {
        "message": "ExceededBudget: User=<user_id> over budget. Spend=3.7e-05, Budget=0.0",
        "type": "budget_exceeded",
        "param": null,
        "code": "400"
    }
}

[测试版]多实例速率限制

使用环境变量 EXPERIMENTAL_MULTI_INSTANCE_RATE_LIMITING="True" 启用多实例速率限制

变更

在更新当前请求/令牌时，这将改为使用 async_increment 而非 async_set_cache。
内存中的缓存每 0.01 秒与 redis 同步一次，以避免每次请求都调用 redis。
在测试中发现，这比之前的实现快 2 倍，并且在高流量（3 个实例总计 100 RPS）下，预期失败和实际失败之间的偏差最多减少到 10 个请求。

授予新模型访问权限

使用模型访问组来授予用户对选定模型的访问权限，并随着时间的推移向其中添加新模型（例如 mistral, llama-2 等）。

使用 /key/generate 和 /user/new 进行此操作的区别是什么？如果您在 /user/new 上进行此操作，它将对为该用户生成的多个密钥持续生效。

步骤 1. 在 config.yaml 中分配模型、访问组

model_list:
  - model_name: text-embedding-ada-002
    litellm_params:
      model: azure/azure-embedding-model
      api_base: "os.environ/AZURE_API_BASE"
      api_key: "os.environ/AZURE_API_KEY"
      api_version: "2023-07-01-preview"
    model_info:
      access_groups: ["beta-models"] # 👈 Model Access Group

步骤 2. 使用访问组创建密钥

curl --location 'https://:4000/user/new' \
-H 'Authorization: Bearer <your-master-key>' \
-H 'Content-Type: application/json' \
-d '{"models": ["beta-models"], # 👈 Model Access Group
            "max_budget": 0}'

为现有内部用户创建新密钥

只需在 /key/generate 请求中包含 user_id 即可。

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "user_id": "krrish@berri.ai"}'

API 规范

`GenericBudgetInfo`

一个 Pydantic 模型，定义了包含时间段和限制的预算信息。

class GenericBudgetInfo(BaseModel):
    budget_limit: float  # The maximum budget amount in USD
    time_period: str    # Duration string like "1d", "30d", etc.

字段:

budget_limit (float): 最大预算金额（美元）
time_period (str): 指定预算时间段的持续时间字符串。支持的格式
- 秒: "30s"
- 分钟: "30m"
- 小时: "30h"
- 天: "30d"

示例:

{
  "budget_limit": "0.0001",
  "time_period": "1d"
}

💰 预算, 速率限制

设置预算​

全局代理​

团队​

为团队添加预算​

为团队添加预算持续时间​

团队成员​

步骤 1. 创建用户​

步骤 2. 将用户添加到现有团队 - 设置 max_budget_in_team​

步骤 3. 为步骤 1 中的团队成员创建密钥​

步骤 4. 为团队成员发起 /chat/completions 请求​

内部用户​

为用户添加预算​

为用户添加预算持续时间​

为现有用户创建新密钥​

虚拟密钥​

为密钥添加预算​

为密钥添加预算持续时间​

✨ 虚拟密钥（按模型指定）​

发送测试请求​

客户​

重置预算​

设置速率限制​

步骤 1. 创建预算​

步骤 2. 创建带有预算的 Customer​

步骤 3. 在 /chat/completions 请求中传递 user_id​

为所有内部用户设置默认预算​

[测试版]多实例速率限制​

授予新模型访问权限​

为现有内部用户创建新密钥​

API 规范​

GenericBudgetInfo​

字段:​

示例:​

设置预算

全局代理

团队

为团队添加预算

为团队添加预算持续时间

团队成员

步骤 1. 创建用户

步骤 2. 将用户添加到现有团队 - 设置 `max_budget_in_team`

步骤 3. 为步骤 1 中的团队成员创建密钥

步骤 4. 为团队成员发起 /chat/completions 请求

内部用户

为用户添加预算

为用户添加预算持续时间

为现有用户创建新密钥

虚拟密钥

为密钥添加预算

为密钥添加预算持续时间

✨ 虚拟密钥（按模型指定）

发送测试请求

客户

重置预算

设置速率限制

步骤 1. 创建预算

步骤 2. 创建带有预算的 `Customer`

步骤 3. 在 `/chat/completions` 请求中传递 `user_id`

为所有内部用户设置默认预算

[测试版]多实例速率限制

授予新模型访问权限

为现有内部用户创建新密钥

API 规范

`GenericBudgetInfo`

字段:

示例: