健康检查

使用此功能对 config.yaml 中定义的所有 LLM 进行健康检查

摘要

代理服务器暴露了

一个 /health 端点，用于返回 LLM API 的健康状况
一个 /health/readiness 端点，用于返回代理服务器是否准备好接受请求
一个 /health/liveliness 端点，用于返回代理服务器是否存活

`/health`

请求

向代理服务器的 /health 端点发起 GET 请求

信息

此端点会向每个模型发起 LLM API 调用，以检查其健康状况。

curl --location 'http://0.0.0.0:4000/health' -H "Authorization: Bearer sk-1234"

你也可以运行 litellm -health，它会为你向 http://0.0.0.0:4000/health 发起一个 get 请求

litellm --health

响应

{
    "healthy_endpoints": [
        {
            "model": "azure/gpt-35-turbo",
            "api_base": "https://my-endpoint-canada-berri992.openai.azure.com/"
        },
        {
            "model": "azure/gpt-35-turbo",
            "api_base": "https://my-endpoint-europe-berri-992.openai.azure.com/"
        }
    ],
    "unhealthy_endpoints": [
        {
            "model": "azure/gpt-35-turbo",
            "api_base": "https://openai-france-1234.openai.azure.com/"
        }
    ]
}

Embedding 模型

要运行 Embedding 健康检查，请在你的配置中为相关模型指定模式为 "embedding"。

model_list:
  - model_name: azure-embedding-model
    litellm_params:
      model: azure/azure-embedding-model
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
    model_info:
      mode: embedding # 👈 ADD THIS

图像生成模型

要运行图像生成健康检查，请在你的配置中为相关模型指定模式为 "image_generation"。

model_list:
  - model_name: dall-e-3
    litellm_params:
      model: azure/dall-e-3
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
    model_info:
      mode: image_generation # 👈 ADD THIS

文本补全模型

要运行 /completions 健康检查，请在你的配置中为相关模型指定模式为 "completion"。

model_list:
  - model_name: azure-text-completion
    litellm_params:
      model: azure/text-davinci-003
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
    model_info:
      mode: completion # 👈 ADD THIS

语音转文本模型

model_list:
  - model_name: whisper
    litellm_params:
      model: whisper-1
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      mode: audio_transcription

文本转语音模型

# OpenAI Text to Speech Models
  - model_name: tts
    litellm_params:
      model: openai/tts-1
      api_key: "os.environ/OPENAI_API_KEY"
    model_info:
      mode: audio_speech

Rerank 模型

要运行 Rerank 健康检查，请在你的配置中为相关模型指定模式为 "rerank"。

model_list:
  - model_name: rerank-english-v3.0
    litellm_params:
      model: cohere/rerank-english-v3.0
      api_key: os.environ/COHERE_API_KEY
    model_info:
      mode: rerank

批量模型（仅限 Azure）

对于部署为 'batch' 模式的 Azure 模型，请设置 mode: batch。

model_list:
  - model_name: "batch-gpt-4o-mini"
    litellm_params:
      model: "azure/batch-gpt-4o-mini"
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
    model_info:
      mode: batch

预期响应

{
    "healthy_endpoints": [
        {
            "api_base": "https://...",
            "model": "azure/gpt-4o-mini",
            "x-ms-region": "East US"
        }
    ],
    "unhealthy_endpoints": [],
    "healthy_count": 1,
    "unhealthy_count": 0
}

实时模型

要运行实时健康检查，请在你的配置中为相关模型指定模式为 "realtime"。

model_list:
  - model_name: openai/gpt-4o-realtime-audio
    litellm_params:
      model: openai/gpt-4o-realtime-audio
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      mode: realtime

通配符路由

对于通配符路由，你可以在 config.yaml 中指定一个 health_check_model。此模型将用于该通配符路由的健康检查。

在此示例中，当对 openai/* 运行健康检查时，健康检查将向 openai/gpt-4o-mini 发起一个 /chat/completions 请求。

model_list:
  - model_name: openai/*
    litellm_params:
      model:  openai/*
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      health_check_model: openai/gpt-4o-mini
  - model_name: anthropic/*
    litellm_params:
      model: anthropic/*
      api_key: os.environ/ANTHROPIC_API_KEY
    model_info:
      health_check_model: anthropic/claude-3-5-sonnet-20240620

后台健康检查

你可以启用在后台运行模型健康检查，以防止通过 /health 过于频繁地查询每个模型。

信息

这会向每个模型发起 LLM API 调用，以检查其健康状况。

使用方法如下

在 config.yaml 中添加

general_settings: 
  background_health_checks: True # enable background health checks
  health_check_interval: 300 # frequency of background health checks

启动服务器

$ litellm /path/to/config.yaml

查询健康端点

curl --location 'http://0.0.0.0:4000/health'

隐藏详细信息

健康检查响应包含端点 URL、错误消息和其他 LiteLLM 参数等详细信息。虽然这对于调试很有用，但在将代理服务器暴露给广泛受众时可能会有问题。

你可以通过将 health_check_details 设置为 False 来隐藏这些详细信息。

general_settings: 
  health_check_details: False

健康检查超时

健康检查超时在 litellm/constants.py 中设置，默认为 60 秒。

这可以在 config.yaml 中的 model_info 部分设置 health_check_timeout 来覆盖。

model_list:
  - model_name: openai/gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      health_check_timeout: 10 # 👈 OVERRIDE HEALTH CHECK TIMEOUT

`/health/readiness`

未受保护的端点，用于检查代理服务器是否准备好接受请求

示例请求

curl http://0.0.0.0:4000/health/readiness

示例响应

{
  "status": "connected",
  "db": "connected",
  "cache": null,
  "litellm_version": "1.40.21",
  "success_callbacks": [
    "langfuse",
    "_PROXY_track_cost_callback",
    "response_taking_too_long_callback",
    "_PROXY_MaxParallelRequestsHandler",
    "_PROXY_MaxBudgetLimiter",
    "_PROXY_CacheControlCheck",
    "ServiceLogging"
  ],
  "last_updated": "2024-07-10T18:59:10.616968"
}

如果代理服务器未连接到数据库，则 "db" 字段将是 "未连接" 而不是 "已连接"，并且 "last_updated" 字段将不存在。

`/health/liveliness`

未受保护的端点，用于检查代理服务器是否存活

示例请求

curl -X 'GET' \
  'http://0.0.0.0:4000/health/liveliness' \
  -H 'accept: application/json'

示例响应

"I'm alive!"

`/health/services`

使用此仅限管理员的端点，检查连接的服务（datadog/slack/langfuse/等）是否健康。

curl -L -X GET 'http://0.0.0.0:4000/health/services?service=datadog'     -H 'Authorization: Bearer sk-1234'

API 参考

高级 - 调用特定模型

要检查特定模型的健康状况，可以按以下方式调用它们

1. 通过 `/model/info` 获取模型 ID

curl -X GET 'http://0.0.0.0:4000/v1/model/info' \
--header 'Authorization: Bearer sk-1234' \

预期响应

{
    "model_name": "bedrock-anthropic-claude-3",
    "litellm_params": {
        "model": "anthropic.claude-3-sonnet-20240229-v1:0"
    },
    "model_info": {
        "id": "634b87c444..", # 👈 UNIQUE MODEL ID
}

2. 通过 `/chat/completions` 调用特定模型

curl -X POST 'https://:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
  "model": "634b87c444.." # 👈 UNIQUE MODEL ID
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
}
'

健康检查

摘要​

/health​

请求​

响应​

Embedding 模型​

图像生成模型​

文本补全模型​

语音转文本模型​

文本转语音模型​

Rerank 模型​

批量模型（仅限 Azure）​

实时模型​

通配符路由​

后台健康检查​

隐藏详细信息​

健康检查超时​

/health/readiness​

/health/liveliness​

/health/services​

高级 - 调用特定模型​

1. 通过 /model/info 获取模型 ID​

2. 通过 /chat/completions 调用特定模型​

摘要