Gradio 聊天机器人 + LiteLLM 教程
用于集成流式 Gradio 聊天机器人演示和 LiteLLM 补全调用的简单教程
安装 & 导入依赖
!pip install gradio litellm
import gradio
import litellm
定义推理函数
记住设置 model
和 api_base
,以匹配托管 LLM 的服务器的预期。
def inference(message, history):
try:
flattened_history = [item for sublist in history for item in sublist]
full_message = " ".join(flattened_history + [message])
messages_litellm = [{"role": "user", "content": full_message}] # litellm message format
partial_message = ""
for chunk in litellm.completion(model="huggingface/meta-llama/Llama-2-7b-chat-hf",
api_base="x.x.x.x:xxxx",
messages=messages_litellm,
max_new_tokens=512,
temperature=.7,
top_k=100,
top_p=.9,
repetition_penalty=1.18,
stream=True):
partial_message += chunk['choices'][0]['delta']['content'] # extract text from streamed litellm chunks
yield partial_message
except Exception as e:
print("Exception encountered:", str(e))
yield f"An Error occurred please 'Clear' the error and try your question again"
定义聊天界面
gr.ChatInterface(
inference,
chatbot=gr.Chatbot(height=400),
textbox=gr.Textbox(placeholder="Enter text here...", container=False, scale=5),
description=f"""
CURRENT PROMPT TEMPLATE: {model_name}.
An incorrect prompt template will cause performance to suffer.
Check the API specifications to ensure this format matches the target LLM.""",
title="Simple Chatbot Test Application",
examples=["Define 'deep learning' in once sentence."],
retry_btn="Retry",
undo_btn="Undo",
clear_btn="Clear",
theme=theme,
).queue().launch()
启动 Gradio 应用
- 从命令行:
python app.py
或gradio app.py
(后者启用实时部署更新) - 在浏览器中访问提供的超链接。
- 享受与远程 LLM 服务器的 Prompt 无关的交互。
推荐的扩展:
- 添加命令行参数以定义目标模型 & 推理端点
感谢 ZQ 提供本教程。