跳到主要内容

创建直通端点

向 LiteLLM 代理添加直通路由

示例:添加一个路由 /v1/rerank,将请求通过 LiteLLM 代理转发到 https://api.cohere.com/v1/rerank

💡 这允许向 LiteLLM 代理发出以下请求

curl --request POST \
--url http://localhost:4000/v1/rerank \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '{
"model": "rerank-english-v3.0",
"query": "What is the capital of the United States?",
"top_n": 3,
"documents": ["Carson City is the capital city of the American state of Nevada."]
}'

教程 - 直通 Cohere 重排序端点

步骤 1litellm config.yaml 中定义直通路由

general_settings:
master_key: sk-1234
pass_through_endpoints:
- path: "/v1/rerank" # route you want to add to LiteLLM Proxy Server
target: "https://api.cohere.com/v1/rerank" # URL this route should forward requests to
headers: # headers to forward to this URL
Authorization: "bearer os.environ/COHERE_API_KEY" # (Optional) Auth Header to forward to your Endpoint
content-type: application/json # (Optional) Extra Headers to pass to this endpoint
accept: application/json
forward_headers: True # (Optional) Forward all headers from the incoming request to the target endpoint

步骤 2detailed_debug 模式启动代理服务器

litellm --config config.yaml --detailed_debug

步骤 3 向直通端点发出请求

此处 http://localhost:4000 是您的 litellm 代理端点

curl --request POST \
--url http://localhost:4000/v1/rerank \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '{
"model": "rerank-english-v3.0",
"query": "What is the capital of the United States?",
"top_n": 3,
"documents": ["Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
"Capitalization or capitalisation in English grammar is the use of a capital letter at the start of a word. English usage varies from capitalization in other languages.",
"Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."]
}'

🎉 预期响应

此请求从 LiteLLM 代理转发到指定的目标 URL(带头)

{
"id": "37103a5b-8cfb-48d3-87c7-da288bedd429",
"results": [
{
"index": 2,
"relevance_score": 0.999071
},
{
"index": 4,
"relevance_score": 0.7867867
},
{
"index": 0,
"relevance_score": 0.32713068
}
],
"meta": {
"api_version": {
"version": "1"
},
"billed_units": {
"search_units": 1
}
}
}

教程 - 直通 Langfuse 请求

步骤 1litellm config.yaml 中定义直通路由

general_settings:
master_key: sk-1234
pass_through_endpoints:
- path: "/api/public/ingestion" # route you want to add to LiteLLM Proxy Server
target: "https://us.cloud.langfuse.com/api/public/ingestion" # URL this route should forward
headers:
LANGFUSE_PUBLIC_KEY: "os.environ/LANGFUSE_DEV_PUBLIC_KEY" # your langfuse account public key
LANGFUSE_SECRET_KEY: "os.environ/LANGFUSE_DEV_SK_KEY" # your langfuse account secret key

步骤 2detailed_debug 模式启动代理服务器

litellm --config config.yaml --detailed_debug

步骤 3 向直通端点发出请求

运行此代码以生成示例跟踪

from langfuse import Langfuse

langfuse = Langfuse(
host="http://localhost:4000", # your litellm proxy endpoint
public_key="anything", # no key required since this is a pass through
secret_key="anything", # no key required since this is a pass through
)

print("sending langfuse trace request")
trace = langfuse.trace(name="test-trace-litellm-proxy-passthrough")
print("flushing langfuse request")
langfuse.flush()

print("flushed langfuse request")

🎉 预期响应

成功时,您将在 Langfuse 控制面板上看到生成以下跟踪

您将在 litellm 代理服务器日志中看到调用以下端点

POST /api/public/ingestion HTTP/1.1" 207 Multi-Status

[企业版]- 在直通端点上使用 LiteLLM 密钥/身份验证

如果您希望直通端点遵守 LiteLLM 密钥/身份验证,请使用此项

这也会在直通端点上强制执行密钥的 RPM 限制。

用法 - 在配置中设置 auth: true

general_settings:
master_key: sk-1234
pass_through_endpoints:
- path: "/v1/rerank"
target: "https://api.cohere.com/v1/rerank"
auth: true # 👈 Key change to use LiteLLM Auth / Keys
headers:
Authorization: "bearer os.environ/COHERE_API_KEY"
content-type: application/json
accept: application/json

使用 LiteLLM 密钥测试请求

curl --request POST \
--url http://localhost:4000/v1/rerank \
--header 'accept: application/json' \
--header 'Authorization: Bearer sk-1234'\
--header 'content-type: application/json' \
--data '{
"model": "rerank-english-v3.0",
"query": "What is the capital of the United States?",
"top_n": 3,
"documents": ["Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
"Capitalization or capitalisation in English grammar is the use of a capital letter at the start of a word. English usage varies from capitalization in other languages.",
"Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."]
}'

使用带 LiteLLM 密钥的 Langfuse 客户端 SDK

用法

  1. 设置 yaml 以直通 langfuse /api/public/ingestion
general_settings:
master_key: sk-1234
pass_through_endpoints:
- path: "/api/public/ingestion" # route you want to add to LiteLLM Proxy Server
target: "https://us.cloud.langfuse.com/api/public/ingestion" # URL this route should forward
auth: true # 👈 KEY CHANGE
custom_auth_parser: "langfuse" # 👈 KEY CHANGE
headers:
LANGFUSE_PUBLIC_KEY: "os.environ/LANGFUSE_DEV_PUBLIC_KEY" # your langfuse account public key
LANGFUSE_SECRET_KEY: "os.environ/LANGFUSE_DEV_SK_KEY" # your langfuse account secret key
  1. 启动代理
litellm --config /path/to/config.yaml
  1. 使用 langfuse SDK 测试

from langfuse import Langfuse

langfuse = Langfuse(
host="http://localhost:4000", # your litellm proxy endpoint
public_key="sk-1234", # your litellm proxy api key
secret_key="anything", # no key required since this is a pass through
)

print("sending langfuse trace request")
trace = langfuse.trace(name="test-trace-litellm-proxy-passthrough")
print("flushing langfuse request")
langfuse.flush()

print("flushed langfuse request")

config.yaml 中的 pass_through_endpoints 规范

pass_through_endpoints 的所有可能值及其含义

示例配置

general_settings:
pass_through_endpoints:
- path: "/v1/rerank" # route you want to add to LiteLLM Proxy Server
target: "https://api.cohere.com/v1/rerank" # URL this route should forward requests to
headers: # headers to forward to this URL
Authorization: "bearer os.environ/COHERE_API_KEY" # (Optional) Auth Header to forward to your Endpoint
content-type: application/json # (Optional) Extra Headers to pass to this endpoint
accept: application/json

规范

  • pass_through_endpoints list: 用于请求转发的端点配置集合。
    • path string: 要添加到 LiteLLM 代理服务器的路由。
    • target string: 此路径的请求要转发到的 URL。
    • headers object: 要随请求一起转发的标头键值对。您可以在此处设置任何键值对,它将被转发到您的目标端点
      • Authorization string: 目标 API 的身份验证标头。
      • content-type string: 请求体的格式规范。
      • accept string: 服务器预期的响应格式。
      • LANGFUSE_PUBLIC_KEY string: 您的 Langfuse 账户公钥 - 仅在转发到 Langfuse 时设置。
      • LANGFUSE_SECRET_KEY string: 您的 Langfuse 账户密钥 - 仅在转发到 Langfuse 时设置。
      • <your-custom-header> string: 传递任何自定义标头键/值对
    • forward_headers 可选(boolean): 如果为 true,则来自传入请求的所有标头都将被转发到目标端点。默认为 False

自定义聊天端点 (Anthropic/Bedrock/Vertex)

允许开发者使用 Anthropic/boto3/等客户端 SDK 调用代理。

参考我们的 Anthropic 适配器进行测试 代码

1. 编写适配器

将请求/响应从您的自定义 API 模式转换为 OpenAI 模式(由 litellm.completion() 使用),然后再转换回来。

对于提供商特定的参数 👉 提供商特定参数

from litellm import adapter_completion
import litellm
from litellm import ChatCompletionRequest, verbose_logger
from litellm.integrations.custom_logger import CustomLogger
from litellm.types.llms.anthropic import AnthropicMessagesRequest, AnthropicResponse
import os

# What is this?
## Translates OpenAI call to Anthropic `/v1/messages` format
import json
import os
import traceback
import uuid
from typing import Literal, Optional

import dotenv
import httpx
from pydantic import BaseModel


###################
# CUSTOM ADAPTER ##
###################

class AnthropicAdapter(CustomLogger):
def __init__(self) -> None:
super().__init__()

def translate_completion_input_params(
self, kwargs
) -> Optional[ChatCompletionRequest]:
"""
- translate params, where needed
- pass rest, as is
"""
request_body = AnthropicMessagesRequest(**kwargs) # type: ignore

translated_body = litellm.AnthropicConfig().translate_anthropic_to_openai(
anthropic_message_request=request_body
)

return translated_body

def translate_completion_output_params(
self, response: litellm.ModelResponse
) -> Optional[AnthropicResponse]:

return litellm.AnthropicConfig().translate_openai_response_to_anthropic(
response=response
)

def translate_completion_output_params_streaming(self) -> Optional[BaseModel]:
return super().translate_completion_output_params_streaming()


anthropic_adapter = AnthropicAdapter()

###########
# TEST IT #
###########

## register CUSTOM ADAPTER
litellm.adapters = [{"id": "anthropic", "adapter": anthropic_adapter}]

## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["COHERE_API_KEY"] = "your-cohere-key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = adapter_completion(model="gpt-3.5-turbo", messages=messages, adapter_id="anthropic")

# cohere call
response = adapter_completion(model="command-nightly", messages=messages, adapter_id="anthropic")
print(response)

2. 创建新端点

我们将步骤 1 中定义的自定义回调类传递给 config.yaml。将 callbacks 设置为 python_filename.logger_instance_name

在以下配置中,我们传递

python_filename: custom_callbacks.py logger_instance_name: anthropic_adapter。这在步骤 1 中定义

target: custom_callbacks.proxy_handler_instance

model_list:
- model_name: my-fake-claude-endpoint
litellm_params:
model: gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY


general_settings:
master_key: sk-1234
pass_through_endpoints:
- path: "/v1/messages" # route you want to add to LiteLLM Proxy Server
target: custom_callbacks.anthropic_adapter # Adapter to use for this route
headers:
litellm_user_api_key: "x-api-key" # Field in headers, containing LiteLLM Key

3. 测试!

启动代理

litellm --config /path/to/config.yaml

Curl

curl --location 'http://0.0.0.0:4000/v1/messages' \
-H 'x-api-key: sk-1234' \
-H 'anthropic-version: 2023-06-01' \ # ignored
-H 'content-type: application/json' \
-D '{
"model": "my-fake-claude-endpoint",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, world"}
]
}'