跳到主要内容

AWS Sagemaker

LiteLLM 支持所有 Sagemaker Huggingface Jumpstart 模型

提示

我们支持所有 Sagemaker 模型,在发送 litellm 请求时,只需将 model=sagemaker/<您的 Sagemaker 模型名称> 设置为前缀即可

API KEY

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

用法

import os 
from litellm import completion

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = completion(
model="sagemaker/<your-endpoint-name>",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80
)

用法 - 流式输出

Sagemaker 当前不支持流式输出 - LiteLLM 通过分块返回响应字符串来模拟流式输出

import os 
from litellm import completion

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = completion(
model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80,
stream=True,
)
for chunk in response:
print(chunk)

LiteLLM 代理用法

以下是使用 LiteLLM 代理服务器调用 Sagemaker 的方法

1. 设置 config.yaml

model_list:
- model_name: jumpstart-model
litellm_params:
model: sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614
aws_access_key_id: os.environ/CUSTOM_AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/CUSTOM_AWS_SECRET_ACCESS_KEY
aws_region_name: os.environ/CUSTOM_AWS_REGION_NAME

所有可能的认证参数

aws_access_key_id: Optional[str],
aws_secret_access_key: Optional[str],
aws_session_token: Optional[str],
aws_region_name: Optional[str],
aws_session_name: Optional[str],
aws_profile_name: Optional[str],
aws_role_name: Optional[str],
aws_web_identity_token: Optional[str],

2. 启动代理

litellm --config /path/to/config.yaml

3. 测试

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "jumpstart-model",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'

设置 temperature, top p 等参数

import os
from litellm import completion

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = completion(
model="sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.7,
top_p=1
)

允许为 Sagemaker 设置 temperature=0

默认情况下,当请求发送 temperature=0 给 LiteLLM 时,LiteLLM 会将其向上取整到 temperature=0.1,因为 Sagemaker 在 temperature=0 时会使大多数请求失败

如果你想为你的模型发送 temperature=0,以下是设置方法(由于 Sagemaker 可以托管任何类型的模型,有些模型允许零温度)

import os
from litellm import completion

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = completion(
model="sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0,
aws_sagemaker_allow_zero_temp=True,
)

传递提供商特定参数

如果你向 litellm 传递非 openai 参数,我们将假定它是提供商特定的参数,并将其作为关键字参数发送到请求正文中。查看更多

import os
from litellm import completion

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = completion(
model="sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614",
messages=[{ "content": "Hello, how are you?","role": "user"}],
top_k=1 # 👈 PROVIDER-SPECIFIC PARAM
)

传递推理组件名称

如果一个终端节点上有多个模型,你需要指定各个模型的名称,通过 model_id 来实现。

import os 
from litellm import completion

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = completion(
model="sagemaker/<your-endpoint-name>",
model_id="<your-model-name",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80
)

将凭据作为参数传递 - Completion()

将 AWS 凭据作为参数传递给 litellm.completion

import os 
from litellm import completion

response = completion(
model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b",
messages=[{ "content": "Hello, how are you?","role": "user"}],
aws_access_key_id="",
aws_secret_access_key="",
aws_region_name="",
)

应用提示词模板

为了为你的 sagemaker 部署应用正确的提示词模板,请同时传入其 hf 模型名称。

import os 
from litellm import completion

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = completion(
model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b",
messages=messages,
temperature=0.2,
max_tokens=80,
hf_model_name="meta-llama/Llama-2-7b",
)

你也可以传入你自己的自定义提示词模板

Sagemaker Messages API

使用路由 sagemaker_chat/* 来路由到 Sagemaker Messages API

model: sagemaker_chat/<your-endpoint-name>
import os
import litellm
from litellm import completion

litellm.set_verbose = True # 👈 SEE RAW REQUEST

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = completion(
model="sagemaker_chat/<your-endpoint-name>",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80
)

补全模型

提示

我们支持所有 Sagemaker 模型,在发送 litellm 请求时,只需将 model=sagemaker/<您的 Sagemaker 模型名称> 设置为前缀即可

以下是使用 LiteLLM 调用 sagemaker 模型的示例

模型名称函数调用
你的自定义 Huggingface 模型completion(model='sagemaker/<你的部署名称>', messages=messages)
Meta Llama 2 7Bcompletion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b', messages=messages)
Meta Llama 2 7B (聊天/微调版)completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b-f', messages=messages)
Meta Llama 2 13Bcompletion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-13b', messages=messages)
Meta Llama 2 13B (聊天/微调版)completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-13b-f', messages=messages)
Meta Llama 2 70Bcompletion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-70b', messages=messages)
Meta Llama 2 70B (聊天/微调版)completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-70b-b-f', messages=messages)

嵌入模型

LiteLLM 支持所有 Sagemaker Jumpstart Huggingface 嵌入模型。以下是调用方法

from litellm import completion

os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""

response = litellm.embedding(model="sagemaker/<your-deployment-name>", input=["good morning from litellm", "this is another item"])
print(f"response: {response}")