[旧版代理 👉 新版代理在此] 本地 LiteLLM 代理服务器

一个快速、轻量级的 OpenAI 兼容服务器，可调用 100+ LLM API。

信息

文档已过时。新文档 👉 此处

使用方法

pip install 'litellm[proxy]'

$ litellm --model ollama/codellama 

#INFO: Ollama running on http://0.0.0.0:8000

测试

在新终端中运行

$ litellm --test

替换 OpenAI 基地址

import openai 

openai.api_base = "http://0.0.0.0:8000"

print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))

其他支持的模型：

假设您在本地运行 vllm

$ litellm --model vllm/facebook/opt-125m

$ litellm --model openai/<model_name> --api_base <your-api-base>

$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1

$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1

$ export TOGETHERAI_API_KEY=my-api-key
$ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k

$ export REPLICATE_API_KEY=my-api-key
$ litellm \
  --model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3

$ litellm --model petals/meta-llama/Llama-2-70b-chat-hf

$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison

$ export AZURE_API_KEY=my-api-key
$ export AZURE_API_BASE=my-api-base

$ litellm --model azure/my-deployment-name

$ export AI21_API_KEY=my-api-key
$ litellm --model j2-light

$ export COHERE_API_KEY=my-api-key
$ litellm --model command-nightly

教程：与多种 LLM + LibreChat/Chatbot-UI/Auto-Gen/ChatDev/Langroid 等配合使用

替换 OpenAI 基地址

import openai 

openai.api_key = "any-string-here"
openai.api_base = "http://0.0.0.0:8080" # your proxy url

# call openai
response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey"}])

print(response)

# call cohere
response = openai.ChatCompletion.create(model="command-nightly", messages=[{"role": "user", "content": "Hey"}])

print(response)

1. 克隆仓库

git clone https://github.com/danny-avila/LibreChat.git

2. 修改 `docker-compose.yml`

OPENAI_REVERSE_PROXY=http://host.docker.internal:8000/v1/chat/completions

3. 将伪造的 OpenAI 密钥保存到 `.env` 文件中

OPENAI_API_KEY=sk-1234

4. 运行 LibreChat：

docker compose up

1. 克隆仓库

git clone https://github.com/dotneet/smart-chatbot-ui.git

2. 安装依赖

npm i

3. 创建您的 env

cp .env.local.example .env.local

4. 设置 API 密钥和基地址

OPENAI_API_KEY="my-fake-key"
OPENAI_API_HOST="http://0.0.0.0:8000

5. 使用 docker compose 运行

docker compose up -d

pip install pyautogen

from autogen import AssistantAgent, UserProxyAgent, oai
config_list=[
    {
        "model": "my-fake-model",
        "api_base": "http://0.0.0.0:8000",  #litellm compatible endpoint
        "api_type": "open_ai",
        "api_key": "NULL", # just a placeholder
    }
]

response = oai.Completion.create(config_list=config_list, prompt="Hi")
print(response) # works fine

llm_config={
    "config_list": config_list,
}

assistant = AssistantAgent("assistant", llm_config=llm_config)
user_proxy = UserProxyAgent("user_proxy")
user_proxy.initiate_chat(assistant, message="Plot a chart of META and TESLA stock price change YTD.", config_list=config_list)

本教程鸣谢 @victordibia。

from autogen import AssistantAgent, GroupChatManager, UserProxyAgent
from autogen.agentchat import GroupChat
config_list = [
    {
        "model": "ollama/mistralorca",
        "api_base": "http://0.0.0.0:8000",  # litellm compatible endpoint
        "api_type": "open_ai",
        "api_key": "NULL",  # just a placeholder
    }
]
llm_config = {"config_list": config_list, "seed": 42}

code_config_list = [
    {
        "model": "ollama/phind-code",
        "api_base": "http://0.0.0.0:8000",  # litellm compatible endpoint
        "api_type": "open_ai",
        "api_key": "NULL",  # just a placeholder
    }
]

code_config = {"config_list": code_config_list, "seed": 42}

admin = UserProxyAgent(
    name="Admin",
    system_message="A human admin. Interact with the planner to discuss the plan. Plan execution needs to be approved by this admin.",
    llm_config=llm_config,
    code_execution_config=False,
)


engineer = AssistantAgent(
    name="Engineer",
    llm_config=code_config,
    system_message="""Engineer. You follow an approved plan. You write python/shell code to solve tasks. Wrap the code in a code block that specifies the script type. The user can't modify your code. So do not suggest incomplete code which requires others to modify. Don't use a code block if it's not intended to be executed by the executor.
Don't include multiple code blocks in one response. Do not ask others to copy and paste the result. Check the execution result returned by the executor.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
""",
)
planner = AssistantAgent(
    name="Planner",
    system_message="""Planner. Suggest a plan. Revise the plan based on feedback from admin and critic, until admin approval.
The plan may involve an engineer who can write code and a scientist who doesn't write code.
Explain the plan first. Be clear which step is performed by an engineer, and which step is performed by a scientist.
""",
    llm_config=llm_config,
)
executor = UserProxyAgent(
    name="Executor",
    system_message="Executor. Execute the code written by the engineer and report the result.",
    human_input_mode="NEVER",
    llm_config=llm_config,
    code_execution_config={"last_n_messages": 3, "work_dir": "paper"},
)
critic = AssistantAgent(
    name="Critic",
    system_message="Critic. Double check plan, claims, code from other agents and provide feedback. Check whether the plan includes adding verifiable info such as source URL.",
    llm_config=llm_config,
)
groupchat = GroupChat(
    agents=[admin, engineer, planner, executor, critic],
    messages=[],
    max_round=50,
)
manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)


admin.initiate_chat(
    manager,
    message="""
""",
)

本教程鸣谢 @Nathan。

设置 ChatDev (文档)

git clone https://github.com/OpenBMB/ChatDev.git
cd ChatDev
conda create -n ChatDev_conda_env python=3.9 -y
conda activate ChatDev_conda_env
pip install -r requirements.txt

使用代理运行 ChatDev

export OPENAI_API_KEY="sk-1234"

export OPENAI_BASE_URL="http://0.0.0.0:8000"

python3 run.py --task "a script that says hello world" --name "hello world"

pip install langroid

from langroid.language_models.openai_gpt import OpenAIGPTConfig, OpenAIGPT

# configure the LLM
my_llm_config = OpenAIGPTConfig(
    # where proxy server is listening 
    api_base="http://0.0.0.0:8000", 
)

# create llm, one-off interaction
llm = OpenAIGPT(my_llm_config)
response = mdl.chat("What is the capital of China?", max_tokens=50)

# Create an Agent with this LLM, wrap it in a Task, and 
# run it as an interactive chat app:
from langroid.agent.base import ChatAgent, ChatAgentConfig
from langroid.agent.task import Task

agent_config = ChatAgentConfig(llm=my_llm_config, name="my-llm-agent")
agent = ChatAgent(agent_config)

task = Task(agent, name="my-llm-task")
task.run() 

本教程鸣谢 @pchalasani 和 Langroid。

本地代理

以下是如何使用本地代理测试不同 GitHub 仓库中的 codellama/mistral/等模型

pip install litellm

$ ollama pull codellama # OUR Local CodeLlama  

$ litellm --model ollama/codellama --temperature 0.3 --max_tokens 2048

教程：与多种 LLM + Aider/AutoGen/Langroid 等配合使用

$ litellm

#INFO: litellm proxy running on http://0.0.0.0:8000

向您的代理发送请求

import openai 

openai.api_key = "any-string-here"
openai.api_base = "http://0.0.0.0:8080" # your proxy url

# call gpt-3.5-turbo
response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey"}])

print(response)

# call ollama/llama2
response = openai.ChatCompletion.create(model="ollama/llama2", messages=[{"role": "user", "content": "Hey"}])

print(response)

Continue-Dev 将 ChatGPT 带到 VSCode。在此处查看如何安装它。

在 config.py 中将其设置为您的默认模型。

  default=OpenAI(
      api_key="IGNORED",
      model="fake-model-name",
      context_length=2048, # customize if needed for your model
      api_base="http://localhost:8000" # your proxy server url
  ),

本教程鸣谢 @vividfog。

$ pip install aider 

$ aider --openai-api-base http://0.0.0.0:8000 --openai-api-key fake-key

pip install pyautogen

from autogen import AssistantAgent, UserProxyAgent, oai
config_list=[
    {
        "model": "my-fake-model",
        "api_base": "http://localhost:8000",  #litellm compatible endpoint
        "api_type": "open_ai",
        "api_key": "NULL", # just a placeholder
    }
]

response = oai.Completion.create(config_list=config_list, prompt="Hi")
print(response) # works fine

llm_config={
    "config_list": config_list,
}

assistant = AssistantAgent("assistant", llm_config=llm_config)
user_proxy = UserProxyAgent("user_proxy")
user_proxy.initiate_chat(assistant, message="Plot a chart of META and TESLA stock price change YTD.", config_list=config_list)

本教程鸣谢 @victordibia。

from autogen import AssistantAgent, GroupChatManager, UserProxyAgent
from autogen.agentchat import GroupChat
config_list = [
    {
        "model": "ollama/mistralorca",
        "api_base": "http://localhost:8000",  # litellm compatible endpoint
        "api_type": "open_ai",
        "api_key": "NULL",  # just a placeholder
    }
]
llm_config = {"config_list": config_list, "seed": 42}

code_config_list = [
    {
        "model": "ollama/phind-code",
        "api_base": "http://localhost:8000",  # litellm compatible endpoint
        "api_type": "open_ai",
        "api_key": "NULL",  # just a placeholder
    }
]

code_config = {"config_list": code_config_list, "seed": 42}

admin = UserProxyAgent(
    name="Admin",
    system_message="A human admin. Interact with the planner to discuss the plan. Plan execution needs to be approved by this admin.",
    llm_config=llm_config,
    code_execution_config=False,
)


engineer = AssistantAgent(
    name="Engineer",
    llm_config=code_config,
    system_message="""Engineer. You follow an approved plan. You write python/shell code to solve tasks. Wrap the code in a code block that specifies the script type. The user can't modify your code. So do not suggest incomplete code which requires others to modify. Don't use a code block if it's not intended to be executed by the executor.
Don't include multiple code blocks in one response. Do not ask others to copy and paste the result. Check the execution result returned by the executor.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
""",
)
planner = AssistantAgent(
    name="Planner",
    system_message="""Planner. Suggest a plan. Revise the plan based on feedback from admin and critic, until admin approval.
The plan may involve an engineer who can write code and a scientist who doesn't write code.
Explain the plan first. Be clear which step is performed by an engineer, and which step is performed by a scientist.
""",
    llm_config=llm_config,
)
executor = UserProxyAgent(
    name="Executor",
    system_message="Executor. Execute the code written by the engineer and report the result.",
    human_input_mode="NEVER",
    llm_config=llm_config,
    code_execution_config={"last_n_messages": 3, "work_dir": "paper"},
)
critic = AssistantAgent(
    name="Critic",
    system_message="Critic. Double check plan, claims, code from other agents and provide feedback. Check whether the plan includes adding verifiable info such as source URL.",
    llm_config=llm_config,
)
groupchat = GroupChat(
    agents=[admin, engineer, planner, executor, critic],
    messages=[],
    max_round=50,
)
manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)


admin.initiate_chat(
    manager,
    message="""
""",
)

本教程鸣谢 @Nathan。

设置 ChatDev (文档)

git clone https://github.com/OpenBMB/ChatDev.git
cd ChatDev
conda create -n ChatDev_conda_env python=3.9 -y
conda activate ChatDev_conda_env
pip install -r requirements.txt

使用代理运行 ChatDev

export OPENAI_API_KEY="sk-1234"

export OPENAI_BASE_URL="http://0.0.0.0:8000"

python3 run.py --task "a script that says hello world" --name "hello world"

pip install langroid

from langroid.language_models.openai_gpt import OpenAIGPTConfig, OpenAIGPT

# configure the LLM
my_llm_config = OpenAIGPTConfig(
    #format: "local/[URL where LiteLLM proxy is listening]
    chat_model="local/localhost:8000", 
    chat_context_length=2048,  # adjust based on model
)

# create llm, one-off interaction
llm = OpenAIGPT(my_llm_config)
response = mdl.chat("What is the capital of China?", max_tokens=50)

# Create an Agent with this LLM, wrap it in a Task, and 
# run it as an interactive chat app:
from langroid.agent.base import ChatAgent, ChatAgentConfig
from langroid.agent.task import Task

agent_config = ChatAgentConfig(llm=my_llm_config, name="my-llm-agent")
agent = ChatAgent(agent_config)

task = Task(agent, name="my-llm-task")
task.run() 

本教程鸣谢 @pchalasani 和 Langroid。

GPT-Pilot 帮助您使用 AI 代理构建应用。[了解更多](https://github.com/Pythagora-io/gpt-pilot)

在您的 .env 文件中将 openai endpoint 设置为您的本地服务器。

OPENAI_ENDPOINT=http://0.0.0.0:8000
OPENAI_API_KEY=my-fake-key

一种用于控制大型语言模型的指导语言。https://github.com/guidance-ai/guidance

注意： Guidance 发送额外的参数，例如 stop_sequences，如果某些模型不支持这些参数，可能会导致失败。

修复：使用 --drop_params 标志启动您的代理

litellm --model ollama/codellama --temperature 0.3 --max_tokens 2048 --drop_params

import guidance

# set api_base to your proxy
# set api_key to anything
gpt4 = guidance.llms.OpenAI("gpt-4", api_base="http://0.0.0.0:8000", api_key="anything")

experts = guidance('''
{{#system~}}
You are a helpful and terse assistant.
{{~/system}}

{{#user~}}
I want a response to the following question:
{{query}}
Name 3 world-class experts (past or present) who would be great at answering this?
Don't answer the question yet.
{{~/user}}

{{#assistant~}}
{{gen 'expert_names' temperature=0 max_tokens=300}}
{{~/assistant}}
''', llm=gpt4)

result = experts(query='How can I be more productive?')
print(result)

注意

贡献将此服务器与项目配合使用？在此贡献您的教程！

高级

日志

$ litellm --logs

这将返回最近的日志（发送到 LLM API 的调用 + 收到的响应）。

所有日志都保存在当前目录中名为 api_logs.json 的文件中。

配置代理

如果您需要

保存 API 密钥
设置 litellm 参数（例如，丢弃未映射的参数，设置回退模型等）
设置模型特定参数（最大 token 数、温度、api 基地址、prompt 模板）

您可以仅为当前会话（通过 cli）设置这些参数，或在重启后仍然保留这些参数（通过配置文件）。

保存 API 密钥

$ litellm --api_key OPENAI_API_KEY=sk-...

LiteLLM 会将其保存到本地配置文件中，并在会话之间持久化。

LiteLLM 代理支持所有 LiteLLM 支持的 API 密钥。要为特定提供商添加密钥，请查看此列表

$ litellm --add_key HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]

$ litellm --add_key ANTHROPIC_API_KEY=my-api-key

$ litellm --add_key PERPLEXITYAI_API_KEY=my-api-key

$ litellm --add_key TOGETHERAI_API_KEY=my-api-key

$ litellm --add_key REPLICATE_API_KEY=my-api-key

$ litellm --add_key AWS_ACCESS_KEY_ID=my-key-id
$ litellm --add_key AWS_SECRET_ACCESS_KEY=my-secret-access-key

$ litellm --add_key PALM_API_KEY=my-palm-key

$ litellm --add_key AZURE_API_KEY=my-api-key
$ litellm --add_key AZURE_API_BASE=my-api-base

$ litellm --add_key AI21_API_KEY=my-api-key

$ litellm --add_key COHERE_API_KEY=my-api-key

例如：设置 api 基地址、最大 token 数和温度。

仅针对该会话:

litellm --model ollama/llama2 \
  --api_base http://localhost:11434 \
  --max_tokens 250 \
  --temperature 0.5

# OpenAI-compatible server running on http://0.0.0.0:8000

性能

我们使用 wrk 对 FastAPI 服务器进行了 500,000 个 HTTP 连接的 1 分钟负载测试。

这是我们的结果

Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   156.38ms   25.52ms 361.91ms   84.73%
    Req/Sec    13.61      5.13    40.00     57.50%
  383625 requests in 1.00m, 391.10MB read
  Socket errors: connect 0, read 1632, write 1, timeout 0

支持 / 与创始人交流

安排演示 👋
社区 Discord 💭
我们的电话号码 📞 +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
我们的电子邮件 ✉️ ishaan@berri.ai / krrish@berri.ai

[旧版代理 👉 新版代理在此] 本地 LiteLLM 代理服务器

使用方法​

测试​

替换 OpenAI 基地址​

其他支持的模型：​

教程：与多种 LLM + LibreChat/Chatbot-UI/Auto-Gen/ChatDev/Langroid 等配合使用​

1. 克隆仓库​

2. 修改 docker-compose.yml​

3. 将伪造的 OpenAI 密钥保存到 .env 文件中​

4. 运行 LibreChat：​

1. 克隆仓库​

2. 安装依赖​

3. 创建您的 env​

4. 设置 API 密钥和基地址​

5. 使用 docker compose 运行​

设置 ChatDev (文档)​

使用代理运行 ChatDev​

本地代理​

教程：与多种 LLM + Aider/AutoGen/Langroid 等配合使用​

向您的代理发送请求​

设置 ChatDev (文档)​

使用代理运行 ChatDev​

高级​

日志​

配置代理​

保存 API 密钥​

性能​

支持 / 与创始人交流​

使用方法

测试

替换 OpenAI 基地址

其他支持的模型：

教程：与多种 LLM + LibreChat/Chatbot-UI/Auto-Gen/ChatDev/Langroid 等配合使用

1. 克隆仓库

2. 修改 `docker-compose.yml`

3. 将伪造的 OpenAI 密钥保存到 `.env` 文件中

4. 运行 LibreChat：

1. 克隆仓库

2. 安装依赖

3. 创建您的 env

4. 设置 API 密钥和基地址

5. 使用 docker compose 运行

设置 ChatDev (文档)

使用代理运行 ChatDev

本地代理

教程：与多种 LLM + Aider/AutoGen/Langroid 等配合使用

向您的代理发送请求

设置 ChatDev (文档)

使用代理运行 ChatDev

高级

日志

配置代理

保存 API 密钥

性能

支持 / 与创始人交流