跳到主要内容

使用视觉模型

快速入门

将图像传递给模型的示例

import os 
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key"

# openai call
response = completion(
model = "gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
)

检查模型是否支持 vision

使用 litellm.supports_vision(model="") -> 如果模型支持 vision 则返回 True,否则返回 False

assert litellm.supports_vision(model="openai/gpt-4-vision-preview") == True
assert litellm.supports_vision(model="vertex_ai/gemini-1.0-pro-vision") == True
assert litellm.supports_vision(model="openai/gpt-3.5-turbo") == False
assert litellm.supports_vision(model="xai/grok-2-vision-latest") == True
assert litellm.supports_vision(model="xai/grok-2-latest") == False

明确指定图像类型

如果您的图像没有 mime-type,或者 litellm 错误地推断了您的图像 mime-type (例如,在使用 vertex ai 调用 gs:// URL 时),您可以通过 format 参数明确设置。

"image_url": {
"url": "gs://my-gs-image",
"format": "image/jpeg"
}

LiteLLM 将在支持指定 mime-type 的任何 API 端点 (例如 anthropic/bedrock/vertex ai) 上使用此设置。

对于其他 (例如 openai),此设置将被忽略。

import os 
from litellm import completion

os.environ["ANTHROPIC_API_KEY"] = "your-api-key"

# openai call
response = completion(
model = "claude-3-7-sonnet-latest",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
"format": "image/jpeg"
}
}
]
}
],
)

规范

"image_url": str

OR

"image_url": {
"url": "url OR base64 encoded str",
"detail": "openai-only param",
"format": "specify mime-type of image"
}