Llama2 - Huggingface 教程

Huggingface 是一个用于部署机器学习模型的开源平台。

使用 Huggingface Inference Endpoints 调用 Llama2

LiteLLM 使得调用你的公共、私有或默认的 Huggingface 端点变得容易。

在这种情况下，让我们尝试调用 3 个模型

模型	端点类型
deepset/deberta-v3-large-squad2	默认 Huggingface 端点
meta-llama/Llama-2-7b-hf	公共端点
meta-llama/Llama-2-7b-chat-hf	私有端点

情况 1：调用默认 Huggingface 端点

以下是完整的示例

from litellm import completion 

model = "deepset/deberta-v3-large-squad2"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 

### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface")

发生了什么？

model：这是在 huggingface 上部署的模型名称
messages：这是输入。我们接受 OpenAI 聊天格式。对于 huggingface，默认情况下我们会遍历列表并添加消息"content"到 prompt 中。相关代码
custom_llm_provider：可选参数。这是一个可选标志，仅在 Azure、Replicate、Huggingface 和 Together-ai（你部署自己模型的平台）上需要。这使得 litellm 可以将请求路由到你模型的正确提供商。

情况 2：调用 Llama2 公共 Huggingface 端点

我们已将 meta-llama/Llama-2-7b-hf 部署在一个公共端点后面 - https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud。

让我们试一试

from litellm import completion 

model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"

### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)

发生了什么？

api_base：可选参数。由于这使用了已部署的端点（不是默认的 huggingface inference endpoint），我们将其传递给 LiteLLM。

情况 3：调用 Llama2 私有 Huggingface 端点

与公共端点唯一的区别在于，你需要一个 api_key 来进行调用。

在 LiteLLM 中，你有 3 种方式可以传入 api_key。

可以通过环境变量设置，将其设置为包变量，或者在调用 completion() 时传入。

通过环境变量设置
这是你需要添加的 1 行代码

os.environ["HF_TOKEN"] = "..."

这是完整的代码

from litellm import completion 

os.environ["HF_TOKEN"] = "..."

model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"

### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)

设置为包变量
这是你需要添加的 1 行代码

litellm.huggingface_key = "..."

这是完整的代码

import litellm
from litellm import completion 

litellm.huggingface_key = "..."

model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"

### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)

在 completion 调用期间传入

completion(..., api_key="...")

这是完整的代码

from litellm import completion 

model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"

### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base, api_key="...")

Llama2 - Huggingface 教程

使用 Huggingface Inference Endpoints 调用 Llama2​

情况 1：调用默认 Huggingface 端点​

情况 2：调用 Llama2 公共 Huggingface 端点​

情况 3：调用 Llama2 私有 Huggingface 端点​

使用 Huggingface Inference Endpoints 调用 Llama2

情况 1：调用默认 Huggingface 端点

情况 2：调用 Llama2 公共 Huggingface 端点

情况 3：调用 Llama2 私有 Huggingface 端点