Infinity

属性	详情
描述	Infinity 是一个高吞吐量、低延迟的 REST API，用于提供文本嵌入、重排序模型和 clip 服务。
LiteLLM 上的提供商路由	`infinity/`
支持的操作	`/rerank`，`/embeddings`
提供商文档链接	Infinity ↗

用法 - LiteLLM Python SDK

from litellm import rerank, embedding
import os

os.environ["INFINITY_API_BASE"] = "http://localhost:8080"

response = rerank(
    model="infinity/rerank",
    query="What is the capital of France?",
    documents=["Paris", "London", "Berlin", "Madrid"],
)

用法 - LiteLLM 代理

LiteLLM 提供一个与 Cohere API 兼容的 /rerank 端点，用于 Rerank 调用。

设置

将此添加到您的 litellm 代理 config.yaml

model_list:
  - model_name: custom-infinity-rerank
    litellm_params:
      model: infinity/rerank
      api_base: https://localhost:8080
      api_key: os.environ/INFINITY_API_KEY

启动 litellm

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

测试请求：

重排序

curl http://0.0.0.0:4000/rerank \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-infinity-rerank",
    "query": "What is the capital of the United States?",
    "documents": [
        "Carson City is the capital city of the American state of Nevada.",
        "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
        "Washington, D.C. is the capital of the United States.",
        "Capital punishment has existed in the United States since before it was a country."
    ],
    "top_n": 3
  }'

支持的 Cohere 重排序 API 参数

参数	类型	描述
`query`	`str`	用于对文档进行重排序的查询
`documents`	`list[str]`	需要重排序的文档
`top_n`	`int`	返回的文档数量
`return_documents`	`bool`	是否在响应中返回文档

用法 - 返回文档

SDK
代理

response = rerank(
    model="infinity/rerank",
    query="What is the capital of France?",
    documents=["Paris", "London", "Berlin", "Madrid"],
    return_documents=True,
)

curl http://0.0.0.0:4000/rerank \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-infinity-rerank",
    "query": "What is the capital of France?",
    "documents": [
        "Paris",
        "London",
        "Berlin",
        "Madrid"
    ],
    "return_documents": True,
  }'

传递提供商特定参数

任何未映射的参数将原样传递给提供商。

SDK
代理

from litellm import rerank
import os

os.environ["INFINITY_API_BASE"] = "http://localhost:8080"

response = rerank(
    model="infinity/rerank",
    query="What is the capital of France?",
    documents=["Paris", "London", "Berlin", "Madrid"],
    raw_scores=True, # 👈 PROVIDER-SPECIFIC PARAM
)

设置 config.yaml

model_list:
  - model_name: custom-infinity-rerank
    litellm_params:
      model: infinity/rerank
      api_base: https://localhost:8080
      raw_scores: True # 👈 EITHER SET PROVIDER-SPECIFIC PARAMS HERE OR IN REQUEST BODY

启动 litellm

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

测试！

curl http://0.0.0.0:4000/rerank \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-infinity-rerank",
    "query": "What is the capital of the United States?",
    "documents": [
        "Carson City is the capital city of the American state of Nevada.",
        "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
        "Washington, D.C. is the capital of the United States.",
        "Capital punishment has existed in the United States since before it was a country."
    ],
    "raw_scores": True # 👈 PROVIDER-SPECIFIC PARAM
  }'

嵌入

LiteLLM 提供一个与 OpenAI API 兼容的 /embeddings 端点，用于嵌入调用。

设置

将此添加到您的 litellm 代理 config.yaml

model_list:
  - model_name: custom-infinity-embedding
    litellm_params:
      model: infinity/provider/custom-embedding-v1
      api_base: http://localhost:8080
      api_key: os.environ/INFINITY_API_KEY

测试请求：

curl http://0.0.0.0:4000/embeddings \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-infinity-embedding",
    "input": ["hello"]
  }'

支持的嵌入 API 参数

参数	类型	描述
`model`	`str`	要使用的嵌入模型
`input`	`list[str]`	用于生成嵌入的文本输入
`encoding_format`	`str`	返回嵌入的格式（例如："float"，"base64"）
`modality`	`str`	输入类型（例如："文本"，"图像"，"音频"）

用法 - 基本示例

SDK
代理

from litellm import embedding
import os

os.environ["INFINITY_API_BASE"] = "http://localhost:8080"

response = embedding(
    model="infinity/bge-small",
    input=["good morning from litellm"]
)

print(response.data[0]['embedding'])

curl http://0.0.0.0:4000/embeddings \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-infinity-embedding",
    "input": ["hello"]
  }'

用法 - OpenAI 客户端

SDK
代理

from openai import OpenAI

client = OpenAI(
  api_key="<LITELLM_MASTER_KEY>",
  base_url="<LITELLM_URL>"
)

response = client.embeddings.create(
  model="bge-small",
  input=["The food was delicious and the waiter..."],
  encoding_format="float"
)

print(response.data[0].embedding)

curl http://0.0.0.0:4000/embeddings \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-small",
    "input": ["The food was delicious and the waiter..."],
    "encoding_format": "float"
  }'

Infinity

用法 - LiteLLM Python SDK​

用法 - LiteLLM 代理​

测试请求：​

重排序​

支持的 Cohere 重排序 API 参数​

用法 - 返回文档​

传递提供商特定参数​

嵌入​

测试请求：​

支持的嵌入 API 参数​

用法 - 基本示例​

用法 - OpenAI 客户端​

用法 - LiteLLM Python SDK

用法 - LiteLLM 代理

测试请求：

重排序

支持的 Cohere 重排序 API 参数

用法 - 返回文档

传递提供商特定参数

嵌入

测试请求：

支持的嵌入 API 参数

用法 - 基本示例

用法 - OpenAI 客户端