LiteLLM - 入门指南
https://github.com/BerriAI/litellm
使用 OpenAI 输入/输出格式调用 100 多种 LLM
- 将输入转换为各提供商的端点(
/chat/completions、/responses、/embeddings、/images、/audio、/batches等) - 一致的输出 - 无论使用哪个提供商,响应格式均保持一致
- 跨多个部署(例如 Azure/OpenAI)的重试/回退逻辑 - 路由 (Router)
- 跟踪支出并设置每个项目的预算 LiteLLM 代理服务器
如何使用 LiteLLM
你可以通过代理服务器或 Python SDK 使用 LiteLLM。两者都为您提供了访问多个 LLM(100 多种)的统一接口。选择最适合您需求的方式。
| LiteLLM 代理服务器 | LiteLLM Python SDK | |
|---|---|---|
| 用例 | 用于访问多个 LLM 的中央服务(LLM 网关) | 直接在 Python 代码中使用 LiteLLM |
| 谁在使用它? | 生成式 AI 赋能团队 / 机器学习平台团队 | 构建 LLM 项目的开发人员 |
| 主要特性 | • 具有身份验证和授权的集中式 API 网关 • 每个项目/用户的多租户成本跟踪和支出管理 • 项目级自定义(日志记录、防护栏、缓存) • 用于安全访问控制的虚拟密钥 • 用于监控和管理的管理员仪表板 UI | • 在代码库中直接集成 Python 库 • 跨多个部署(如 Azure/OpenAI)的具有重试/回退逻辑的路由器 - 路由 (Router) • 应用级负载均衡和成本跟踪 • 具有 OpenAI 兼容错误信息的异常处理 • 可观测性回调(Lunary、MLflow、Langfuse 等) |
LiteLLM Python SDK
基本用法
pip install litellm
- OpenAI
- Anthropic
- xAI
- VertexAI
- NVIDIA
- HuggingFace
- Azure OpenAI
- Ollama
- Openrouter
- Novita AI
- Vercel AI Gateway
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = completion(
model="openai/gpt-5",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
from litellm import completion
import os
## set ENV variables
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
response = completion(
model="anthropic/claude-sonnet-4-5-20250929",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
from litellm import completion
import os
## set ENV variables
os.environ["XAI_API_KEY"] = "your-api-key"
response = completion(
model="xai/grok-2-latest",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
from litellm import completion
import os
# auth: run 'gcloud auth application-default'
os.environ["VERTEXAI_PROJECT"] = "hardy-device-386718"
os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = completion(
model="vertex_ai/gemini-1.5-pro",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
from litellm import completion
import os
## set ENV variables
os.environ["NVIDIA_NIM_API_KEY"] = "nvidia_api_key"
os.environ["NVIDIA_NIM_API_BASE"] = "nvidia_nim_endpoint_url"
response = completion(
model="nvidia_nim/<model_name>",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
from litellm import completion
import os
os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key"
# e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
response = completion(
model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0",
messages=[{ "content": "Hello, how are you?","role": "user"}],
api_base="https://my-endpoint.huggingface.cloud"
)
print(response)
from litellm import completion
import os
## set ENV variables
os.environ["AZURE_API_KEY"] = ""
os.environ["AZURE_API_BASE"] = ""
os.environ["AZURE_API_VERSION"] = ""
# azure call
response = completion(
"azure/<your_deployment_name>",
messages = [{ "content": "Hello, how are you?","role": "user"}]
)
from litellm import completion
response = completion(
model="ollama/llama2",
messages = [{ "content": "Hello, how are you?","role": "user"}],
api_base="https://:11434"
)
from litellm import completion
import os
## set ENV variables
os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key"
response = completion(
model="openrouter/google/palm-2-chat-bison",
messages = [{ "content": "Hello, how are you?","role": "user"}],
)
from litellm import completion
import os
## set ENV variables. Visit https://novita.ai/settings/key-management to get your API key
os.environ["NOVITA_API_KEY"] = "novita-api-key"
response = completion(
model="novita/deepseek/deepseek-r1",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
from litellm import completion
import os
## set ENV variables. Visit https://vercel.com/docs/ai-gateway#using-the-ai-gateway-with-an-api-key for instructions on obtaining a key
os.environ["VERCEL_AI_GATEWAY_API_KEY"] = "your-vercel-api-key"
response = completion(
model="vercel_ai_gateway/openai/gpt-5",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
响应格式(OpenAI 聊天补全格式)
{
"id": "chatcmpl-565d891b-a42e-4c39-8d14-82a1f5208885",
"created": 1734366691,
"model": "gpt-5",
"object": "chat.completion",
"system_fingerprint": null,
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! As an AI language model, I don't have feelings, but I'm operating properly and ready to assist you with any questions or tasks you may have. How can I help you today?",
"role": "assistant",
"tool_calls": null,
"function_call": null
}
}
],
"usage": {
"completion_tokens": 43,
"prompt_tokens": 13,
"total_tokens": 56,
"completion_tokens_details": null,
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": 0
},
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0
}
}
Responses API
对于支持推理内容(如 GPT-5、o3 等)的高级模型,请使用 litellm.responses()。
- OpenAI
- Anthropic (Claude)
- VertexAI
- Azure OpenAI
from litellm import responses
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = responses(
model="gpt-5-mini",
messages=[{ "content": "What is the capital of France?","role": "user"}],
reasoning_effort="medium"
)
print(response)
print(response.choices[0].message.content) # response
print(response.choices[0].message.reasoning_content) # reasoning
from litellm import responses
import os
## set ENV variables
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
response = responses(
model="claude-3.5-sonnet",
messages=[{ "content": "What is the capital of France?","role": "user"}]
)
from litellm import responses
import os
# auth: run 'gcloud auth application-default'
os.environ["VERTEXAI_PROJECT"] = "jr-smith-386718"
os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = responses(
model="vertex_ai/gemini-1.5-pro",
messages=[{ "content": "What is the capital of France?","role": "user"}]
)
from litellm import responses
import os
## set ENV variables
os.environ["AZURE_API_KEY"] = ""
os.environ["AZURE_API_BASE"] = ""
os.environ["AZURE_API_VERSION"] = ""
# azure call
response = responses(
"azure/<your_deployment_name>",
messages = [{ "content": "What is the capital of France?","role": "user"}]
)
print(response)
流式传输
在 completion 参数中设置 stream=True。
- OpenAI
- Anthropic
- xAI
- VertexAI
- NVIDIA
- HuggingFace
- Azure OpenAI
- Ollama
- Openrouter
- Novita AI
- Vercel AI Gateway
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = completion(
model="openai/gpt-5",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
from litellm import completion
import os
## set ENV variables
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
response = completion(
model="anthropic/claude-sonnet-4-5-20250929",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
from litellm import completion
import os
## set ENV variables
os.environ["XAI_API_KEY"] = "your-api-key"
response = completion(
model="xai/grok-2-latest",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
from litellm import completion
import os
# auth: run 'gcloud auth application-default'
os.environ["VERTEXAI_PROJECT"] = "hardy-device-386718"
os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = completion(
model="vertex_ai/gemini-1.5-pro",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
from litellm import completion
import os
## set ENV variables
os.environ["NVIDIA_NIM_API_KEY"] = "nvidia_api_key"
os.environ["NVIDIA_NIM_API_BASE"] = "nvidia_nim_endpoint_url"
response = completion(
model="nvidia_nim/<model_name>",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
from litellm import completion
import os
os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key"
# e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
response = completion(
model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0",
messages=[{ "content": "Hello, how are you?","role": "user"}],
api_base="https://my-endpoint.huggingface.cloud",
stream=True,
)
print(response)
from litellm import completion
import os
## set ENV variables
os.environ["AZURE_API_KEY"] = ""
os.environ["AZURE_API_BASE"] = ""
os.environ["AZURE_API_VERSION"] = ""
# azure call
response = completion(
"azure/<your_deployment_name>",
messages = [{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
from litellm import completion
response = completion(
model="ollama/llama2",
messages = [{ "content": "Hello, how are you?","role": "user"}],
api_base="https://:11434",
stream=True,
)
from litellm import completion
import os
## set ENV variables
os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key"
response = completion(
model="openrouter/google/palm-2-chat-bison",
messages = [{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
from litellm import completion
import os
## set ENV variables. Visit https://novita.ai/settings/key-management to get your API key
os.environ["NOVITA_API_KEY"] = "novita_api_key"
response = completion(
model="novita/deepseek/deepseek-r1",
messages = [{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
from litellm import completion
import os
## set ENV variables. Visit https://vercel.com/docs/ai-gateway#using-the-ai-gateway-with-an-api-key for instructions on obtaining a key
os.environ["VERCEL_AI_GATEWAY_API_KEY"] = "your-vercel-api-key"
response = completion(
model="vercel_ai_gateway/openai/gpt-5",
messages = [{ "content": "Hello, how are you?","role": "user"}],
stream=True,
)
流式响应格式(OpenAI 格式)
{
"id": "chatcmpl-2be06597-eb60-4c70-9ec5-8cd2ab1b4697",
"created": 1734366925,
"model": "claude-sonnet-4-5-20250929",
"object": "chat.completion.chunk",
"system_fingerprint": null,
"choices": [
{
"finish_reason": null,
"index": 0,
"delta": {
"content": "Hello",
"role": "assistant",
"function_call": null,
"tool_calls": null,
"audio": null
},
"logprobs": null
}
]
}
异常处理
LiteLLM 将所有支持的提供商的异常映射为 OpenAI 异常。我们所有的异常都继承自 OpenAI 的异常类型,因此您针对 OpenAI 编写的任何错误处理代码都应能直接在 LiteLLM 中生效。
import litellm
from litellm import completion
import os
os.environ["ANTHROPIC_API_KEY"] = "bad-key"
try:
completion(model="anthropic/claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}])
except litellm.AuthenticationError as e:
# Thrown when the API key is invalid
print(f"Authentication failed: {e}")
except litellm.RateLimitError as e:
# Thrown when you've exceeded your rate limit
print(f"Rate limited: {e}")
except litellm.APIError as e:
# Thrown for general API errors
print(f"API error: {e}")
日志记录与可观测性 - 记录 LLM 输入/输出 (文档)
LiteLLM 提供了预定义的回调函数,用于将数据发送至 MLflow、Lunary、Langfuse、Helicone、Promptlayer、Traceloop、Slack 等平台。
from litellm import completion
## set env variables for logging tools (API key set up is not required when using MLflow)
os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key" # get your key at https://app.lunary.ai/settings
os.environ["HELICONE_API_KEY"] = "your-helicone-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["OPENAI_API_KEY"]
# set callbacks
litellm.success_callback = ["lunary", "mlflow", "langfuse", "helicone"] # log input/output to lunary, mlflow, langfuse, helicone
#openai call
response = completion(model="openai/gpt-5", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
跟踪流式传输的成本、用量和延迟
为此使用回调函数 - 有关自定义回调的更多信息:https://docs.litellm.com.cn/docs/observability/custom_callback
import litellm
# track_cost_callback
def track_cost_callback(
kwargs, # kwargs to completion
completion_response, # response from completion
start_time, end_time # start/end time
):
try:
response_cost = kwargs.get("response_cost", 0)
print("streaming response_cost", response_cost)
except:
pass
# set callback
litellm.success_callback = [track_cost_callback] # set custom callback function
# litellm.completion() call
response = completion(
model="openai/gpt-5",
messages=[
{
"role": "user",
"content": "Hi 👋 - i'm openai"
}
],
stream=True
)
LiteLLM 代理服务器 (LLM 网关)
跟踪多个项目/人员的支出
代理服务器提供:
📖 代理端点 - Swagger 文档
访问此处查看关于密钥和速率限制的完整教程 - 此处
代理服务器快速入门 - CLI
pip install 'litellm[proxy]'
第 1 步:启动 LiteLLM 代理
- pip 包
- Docker 容器
$ litellm --model huggingface/bigcode/starcoder
#INFO: Proxy running on http://0.0.0.0:4000
第 1 步:创建 config.yaml
示例 litellm_config.yaml
model_list:
- model_name: gpt-5
litellm_params:
model: azure/<your-azure-model-deployment>
api_base: os.environ/AZURE_API_BASE # runs os.getenv("AZURE_API_BASE")
api_key: os.environ/AZURE_API_KEY # runs os.getenv("AZURE_API_KEY")
api_version: "2023-07-01-preview"
litellm_settings:
master_key: sk-1234
database_url: postgres://
第 2 步:运行 Docker 镜像
docker run \
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
-e AZURE_API_KEY=d6*********** \
-e AZURE_API_BASE=https://openai-***********/ \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-latest \
--config /app/config.yaml --detailed_debug
第 2 步:向代理发送 ChatCompletions 请求
- 聊天完成
- Responses API
import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-5", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
from openai import OpenAI
client = OpenAI(
api_key="sk-1234",
base_url="http://0.0.0.0:4000"
)
response = client.responses.create(
model="gpt-5",
input="Tell me a three sentence bedtime story about a unicorn."
)
print(response)