VertexAI[Anthropic, Gemini, Model Garden]

概述

属性	详情
描述	Vertex AI 是一个用于构建和使用生成式 AI 的全托管 AI 开发平台。
LiteLLM 上的提供商路由	`vertex_ai/`
提供商文档链接	Vertex AI ↗
Base URL	https://{vertex_location}-aiplatform.googleapis.com/
支持的操作	`/chat/completions`, `/completions`, `/embeddings`, `/audio/speech`, `/fine_tuning`, `/batches`, `/files`, `/images`

`vertex_ai/` 路由

vertex_ai/ 路由使用 VertexAI 的 REST API。

from litellm import completion
import json 

## GET CREDENTIALS 
## RUN ## 
# !gcloud auth application-default login - run this to add vertex credentials to your env
## OR ## 
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)

## COMPLETION CALL 
response = completion(
  model="vertex_ai/gemini-pro",
  messages=[{ "content": "Hello, how are you?","role": "user"}],
  vertex_credentials=vertex_credentials_json
)

系统消息

from litellm import completion
import json 

## GET CREDENTIALS 
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)


response = completion(
  model="vertex_ai/gemini-pro",
  messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}], 
  vertex_credentials=vertex_credentials_json
)

函数调用

使用 tool_choice="required" 强制 Gemini 进行工具调用。

from litellm import completion
import json 

## GET CREDENTIALS 
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)


messages = [
    {
        "role": "system",
        "content": "Your name is Litellm Bot, you are a helpful assistant",
    },
    # User asks for their name and weather in San Francisco
    {
        "role": "user",
        "content": "Hello, what is your name and can you tell me the weather?",
    },
]

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    }
                },
                "required": ["location"],
            },
        },
    }
]

data = {
    "model": "vertex_ai/gemini-1.5-pro-preview-0514"),
    "messages": messages,
    "tools": tools,
    "tool_choice": "required",
    "vertex_credentials": vertex_credentials_json
}

## COMPLETION CALL 
print(completion(**data))

JSON Schema

从 v1.40.1+ 开始，LiteLLM 支持将 response_schema 作为参数发送给 Vertex AI 上的 Gemini-1.5-Pro。对于其他模型（例如 gemini-1.5-flash 或 claude-3-5-sonnet），LiteLLM 会将 schema 添加到消息列表中，并带有用户控制的 prompt。

响应 Schema

SDK
代理

from litellm import completion 
import json 

## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env

messages = [
    {
        "role": "user",
        "content": "List 5 popular cookie recipes."
    }
]

response_schema = {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "recipe_name": {
                    "type": "string",
                },
            },
            "required": ["recipe_name"],
        },
    }


completion(
    model="vertex_ai/gemini-1.5-pro", 
    messages=messages, 
    response_format={"type": "json_object", "response_schema": response_schema} # 👈 KEY CHANGE
    )

print(json.loads(completion.choices[0].message.content))

将模型添加到 config.yaml

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: vertex_ai/gemini-1.5-pro
      vertex_project: "project-id"
      vertex_location: "us-central1"
      vertex_credentials: "/path/to/service_account.json" # [OPTIONAL] Do this OR `!gcloud auth application-default login` - run this to add vertex credentials to your env

启动代理

$ litellm --config /path/to/config.yaml

发送请求！

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
  "model": "gemini-pro",
  "messages": [
        {"role": "user", "content": "List 5 popular cookie recipes."}
    ],
  "response_format": {"type": "json_object", "response_schema": { 
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "recipe_name": {
                    "type": "string",
                },
            },
            "required": ["recipe_name"],
        },
    }}
}
'

验证 Schema

要验证 response_schema，请设置 enforce_validation: true。

SDK
代理

from litellm import completion, JSONSchemaValidationError
try: 
    completion(
    model="vertex_ai/gemini-1.5-pro", 
    messages=messages, 
    response_format={
        "type": "json_object", 
        "response_schema": response_schema,
        "enforce_validation": true # 👈 KEY CHANGE
    }
    )
except JSONSchemaValidationError as e: 
    print("Raw Response: {}".format(e.raw_response))
    raise e

将模型添加到 config.yaml

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: vertex_ai/gemini-1.5-pro
      vertex_project: "project-id"
      vertex_location: "us-central1"
      vertex_credentials: "/path/to/service_account.json" # [OPTIONAL] Do this OR `!gcloud auth application-default login` - run this to add vertex credentials to your env

启动代理

$ litellm --config /path/to/config.yaml

发送请求！

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
  "model": "gemini-pro",
  "messages": [
        {"role": "user", "content": "List 5 popular cookie recipes."}
    ],
  "response_format": {"type": "json_object", "response_schema": { 
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "recipe_name": {
                    "type": "string",
                },
            },
            "required": ["recipe_name"],
        },
    }, 
    "enforce_validation": true
    }
}
'

LiteLLM 将根据 schema 验证响应，如果响应与 schema 不匹配，则会引发 JSONSchemaValidationError。

JSONSchemaValidationError 继承自 openai.APIError

使用 e.raw_response 访问原始响应

自己添加到 prompt

from litellm import completion 

## GET CREDENTIALS 
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)

messages = [
    {
        "role": "user",
        "content": """
List 5 popular cookie recipes.

Using this JSON schema:

    Recipe = {"recipe_name": str}

Return a `list[Recipe]`
        """
    }
]

completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" })

基础（Grounding）- 网页搜索

将 Google 搜索结果 grounding 添加到 Vertex AI 调用。

相关的 VertexAI 文档

使用 response_obj._hidden_params["vertex_ai_grounding_metadata"] 查看 grounding 元数据

SDK
代理

from litellm import completion 

## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env

tools = [{"googleSearch": {}}] # 👈 ADD GOOGLE SEARCH

resp = litellm.completion(
                    model="vertex_ai/gemini-1.0-pro-001",
                    messages=[{"role": "user", "content": "Who won the world cup?"}],
                    tools=tools,
                )

print(resp)

OpenAI Python SDK
cURL

from openai import OpenAI

client = OpenAI(
    api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)

response = client.chat.completions.create(
    model="gemini-pro",
    messages=[{"role": "user", "content": "Who won the world cup?"}],
    tools=[{"googleSearch": {}}],
)

print(response)

curl https://:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gemini-pro",
    "messages": [
      {"role": "user", "content": "Who won the world cup?"}
    ],
   "tools": [
        {
            "googleSearch": {} 
        }
    ]
  }'

您还可以使用 enterpriseWebSearch 工具进行企业合规搜索。

SDK
代理

from litellm import completion 

## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env

tools = [{"enterpriseWebSearch": {}}] # 👈 ADD GOOGLE ENTERPRISE SEARCH

resp = litellm.completion(
                    model="vertex_ai/gemini-1.0-pro-001",
                    messages=[{"role": "user", "content": "Who won the world cup?"}],
                    tools=tools,
                )

print(resp)

OpenAI Python SDK
cURL

from openai import OpenAI

client = OpenAI(
    api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)

response = client.chat.completions.create(
    model="gemini-pro",
    messages=[{"role": "user", "content": "Who won the world cup?"}],
    tools=[{"enterpriseWebSearch": {}}],
)

print(response)

curl https://:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gemini-pro",
    "messages": [
      {"role": "user", "content": "Who won the world cup?"}
    ],
   "tools": [
        {
            "enterpriseWebSearch": {} 
        }
    ]
  }'

从 Vertex AI SDK 迁移到 LiteLLM (GROUNDING)

如果这是您最初的 VertexAI Grounding 代码，

import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig, Tool, grounding


vertexai.init(project=project_id, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-001")

# Use Google Search for grounding
tool = Tool.from_google_search_retrieval(grounding.GoogleSearchRetrieval())

prompt = "When is the next total solar eclipse in US?"
response = model.generate_content(
    prompt,
    tools=[tool],
    generation_config=GenerationConfig(
        temperature=0.0,
    ),
)

print(response)

那么，现在看起来是这样

from litellm import completion


# !gcloud auth application-default login - run this to add vertex credentials to your env

tools = [{"googleSearch": {"disable_attributon": False}}] # 👈 ADD GOOGLE SEARCH

resp = litellm.completion(
                    model="vertex_ai/gemini-1.0-pro-001",
                    messages=[{"role": "user", "content": "Who won the world cup?"}],
                    tools=tools,
                    vertex_project="project-id"
                )

print(resp)

思考 / `reasoning_content`

LiteLLM 将 OpenAI 的 reasoning_effort 转换为 Gemini 的 thinking 参数。代码

映射

reasoning_effort	thinking
"低"	"budget_tokens": 1024
"中"	"budget_tokens": 2048
"高"	"budget_tokens": 4096

SDK
代理

from litellm import completion

# !gcloud auth application-default login - run this to add vertex credentials to your env

resp = completion(
    model="vertex_ai/gemini-2.5-flash-preview-04-17",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    reasoning_effort="low",
    vertex_project="project-id",
    vertex_location="us-central1"
)

设置 config.yaml

- model_name: gemini-2.5-flash
  litellm_params:
    model: vertex_ai/gemini-2.5-flash-preview-04-17
    vertex_credentials: {"project_id": "project-id", "location": "us-central1", "project_key": "project-key"}
    vertex_project: "project-id"
    vertex_location: "us-central1"

启动代理

litellm --config /path/to/config.yaml

测试一下！

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "reasoning_effort": "low"
  }'

预期响应

ModelResponse(
    id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
    created=1740470510,
    model='claude-3-7-sonnet-20250219',
    object='chat.completion',
    system_fingerprint=None,
    choices=[
        Choices(
            finish_reason='stop',
            index=0,
            message=Message(
                content="The capital of France is Paris.",
                role='assistant',
                tool_calls=None,
                function_call=None,
                reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
            ),
        )
    ],
    usage=Usage(
        completion_tokens=68,
        prompt_tokens=42,
        total_tokens=110,
        completion_tokens_details=None,
        prompt_tokens_details=PromptTokensDetailsWrapper(
            audio_tokens=None,
            cached_tokens=0,
            text_tokens=None,
            image_tokens=None
        ),
        cache_creation_input_tokens=0,
        cache_read_input_tokens=0
    )
)

将 `thinking` 参数传递给 Gemini 模型

您也可以将 thinking 参数传递给 Gemini 模型。

这被转换为 Gemini 的thinkingConfig 参数。

SDK
代理

from litellm import completion

# !gcloud auth application-default login - run this to add vertex credentials to your env

response = litellm.completion(
  model="vertex_ai/gemini-2.5-flash-preview-04-17",
  messages=[{"role": "user", "content": "What is the capital of France?"}],
  thinking={"type": "enabled", "budget_tokens": 1024},
  vertex_project="project-id",
  vertex_location="us-central1"
)

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -d '{
    "model": "vertex_ai/gemini-2.5-flash-preview-04-17",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "thinking": {"type": "enabled", "budget_tokens": 1024}
  }'

上下文缓存

通过直接调用提供商 API 支持 Vertex AI 上下文缓存。（统一端点支持即将推出。）

直接访问提供商

先决条件

pip install google-cloud-aiplatform（已预装在代理 docker 镜像中）
身份验证
- 运行 gcloud auth application-default login。请参阅Google Cloud 文档
- 或者，您可以设置 GOOGLE_APPLICATION_CREDENTIALS
  方法如下：跳到代码
  - 在 GCP 上创建一个服务帐号
  - 将凭据导出为 json
  - 加载 json 并将 json.dump 的结果作为字符串
  - 将 json 字符串存储在您的环境变量 GOOGLE_APPLICATION_CREDENTIALS 中

示例用法

import litellm
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1"  # proj location

response = litellm.completion(model="gemini-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])

LiteLLM 代理服务器用法

以下是如何将 Vertex AI 与 LiteLLM 代理服务器一起使用的方法

修改 config.yaml

每个模型的不同位置
一个位置用于所有 Vertex 模型

当您需要为每个 Vertex 模型设置不同位置时使用此选项

model_list:
  - model_name: gemini-vision
    litellm_params:
      model: vertex_ai/gemini-1.0-pro-vision-001
      vertex_project: "project-id"
      vertex_location: "us-central1"
  - model_name: gemini-vision
    litellm_params:
      model: vertex_ai/gemini-1.0-pro-vision-001
      vertex_project: "project-id2"
      vertex_location: "us-east"

当您有一个 Vertex 位置用于所有模型时使用此选项

litellm_settings: 
  vertex_project: "hardy-device-38811" # Your Project ID
  vertex_location: "us-central1" # proj location

model_list: 
  -model_name: team1-gemini-pro
  litellm_params: 
    model: gemini-pro

启动代理
```
$ litellm --config /path/to/config.yaml
```

发送请求到 LiteLLM 代理服务器

OpenAI Python v1.0.0+
curl

import openai
client = openai.OpenAI(
    api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)

response = client.chat.completions.create(
    model="team1-gemini-pro",
    messages = [
        {
            "role": "user",
            "content": "what llm are you"
        }
    ],
)

print(response)

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "team1-gemini-pro",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ],
}'

身份验证 - vertex_project, vertex_location 等。

通过以下方式设置您的 Vertex 凭据

动态参数或
环境变量

动态参数

您可以设置

vertex_credentials (str) - 可以是 json 字符串或指向您的 vertex ai 服务帐号 .json 文件的路径
vertex_location (str) - Vertex 模型部署的位置 (us-central1, asia-southeast1 等)
vertex_project (可选)[str]- 如果 Vertex 项目与 vertex_credentials 中的不同，则使用此项

作为 litellm.completion 调用的动态参数。

SDK
代理

from litellm import completion
import json 

## GET CREDENTIALS 
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)


response = completion(
  model="vertex_ai/gemini-pro",
  messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}], 
  vertex_credentials=vertex_credentials_json,
  vertex_project="my-special-project", 
  vertex_location="my-special-location"
)

model_list:
    - model_name: gemini-1.5-pro
      litellm_params:
        model: gemini-1.5-pro
        vertex_credentials: os.environ/VERTEX_FILE_PATH_ENV_VAR # os.environ["VERTEX_FILE_PATH_ENV_VAR"] = "/path/to/service_account.json" 
        vertex_project: "my-special-project"
        vertex_location: "my-special-location:

环境变量

您可以设置

GOOGLE_APPLICATION_CREDENTIALS - 在此处存储您的 service_account.json 文件路径（Vertex SDK 直接使用）。
VERTEXAI_LOCATION - Vertex 模型部署的位置 (us-central1, asia-southeast1 等)
VERTEXAI_PROJECT - 可选[str]- 如果 Vertex 项目与 vertex_credentials 中的不同，则使用此项

GOOGLE_APPLICATION_CREDENTIALS

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json"

VERTEXAI_LOCATION

export VERTEXAI_LOCATION="us-central1" # can be any vertex location

VERTEXAI_PROJECT

export VERTEXAI_PROJECT="my-test-project" # ONLY use if model project is different from service account project

指定安全设置

在某些用例中，您可能需要调用模型并传递与默认设置不同的安全设置。为此，只需将 safety_settings 参数传递给 completion 或 acompletion。例如

按模型/请求设置

SDK
代理

response = completion(
    model="vertex_ai/gemini-pro", 
    messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
    safety_settings=[
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_NONE",
        },
    ]
)

选项 1：在 config 中设置

model_list:
  - model_name: gemini-experimental
    litellm_params:
      model: vertex_ai/gemini-experimental
      vertex_project: litellm-epic
      vertex_location: us-central1
      safety_settings:
      - category: HARM_CATEGORY_HARASSMENT
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_HATE_SPEECH
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_SEXUALLY_EXPLICIT
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_DANGEROUS_CONTENT
        threshold: BLOCK_NONE

选项 2：在调用时设置

response = client.chat.completions.create(
    model="gemini-experimental",
    messages=[
        {
            "role": "user",
            "content": "Can you write exploits?",
        }
    ],
    max_tokens=8192,
    stream=False,
    temperature=0.0,

    extra_body={
        "safety_settings": [
            {
                "category": "HARM_CATEGORY_HARASSMENT",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_HATE_SPEECH",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                "threshold": "BLOCK_NONE",
            },
        ],
    }
)

全局设置

SDK
代理

import litellm 

litellm.set_verbose = True 👈 See RAW REQUEST/RESPONSE 

litellm.vertex_ai_safety_settings = [
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_NONE",
        },
    ]
response = completion(
    model="vertex_ai/gemini-pro", 
    messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)

model_list:
  - model_name: gemini-experimental
    litellm_params:
      model: vertex_ai/gemini-experimental
      vertex_project: litellm-epic
      vertex_location: us-central1

litellm_settings:
    vertex_ai_safety_settings:
      - category: HARM_CATEGORY_HARASSMENT
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_HATE_SPEECH
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_SEXUALLY_EXPLICIT
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_DANGEROUS_CONTENT
        threshold: BLOCK_NONE

设置 Vertex 项目和 Vertex 位置

所有使用 Vertex AI 的调用都需要以下参数

您的项目 ID

import os, litellm 

# set via env var
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811" # Your Project ID`

### OR ###

# set directly on module 
litellm.vertex_project = "hardy-device-38811" # Your Project ID`

您的项目位置

import os, litellm 

# set via env var
os.environ["VERTEXAI_LOCATION"] = "us-central1 # Your Location

### OR ###

# set directly on module 
litellm.vertex_location = "us-central1 # Your Location

Anthropic

模型名称	函数调用
claude-3-opus@20240229	`completion('vertex_ai/claude-3-opus@20240229', messages)`
claude-3-5-sonnet@20240620	`completion('vertex_ai/claude-3-5-sonnet@20240620', messages)`
claude-3-sonnet@20240229	`completion('vertex_ai/claude-3-sonnet@20240229', messages)`
claude-3-haiku@20240307	`completion('vertex_ai/claude-3-haiku@20240307', messages)`
claude-3-7-sonnet@20250219	`completion('vertex_ai/claude-3-7-sonnet@20250219', messages)`

用法

SDK
代理

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "claude-3-sonnet@20240229"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
    model="vertex_ai/" + model,
    messages=[{"role": "user", "content": "hi"}],
    temperature=0.7,
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

1. 添加到 config

model_list:
    - model_name: anthropic-vertex
      litellm_params:
        model: vertex_ai/claude-3-sonnet@20240229
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: anthropic-vertex
      litellm_params:
        model: vertex_ai/claude-3-sonnet@20240229
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. 测试一下！

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "anthropic-vertex", # 👈 the 'model_name' in config
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

用法 - `thinking` / `reasoning_content`

SDK
代理

from litellm import completion

resp = completion(
    model="vertex_ai/claude-3-7-sonnet-20250219",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    thinking={"type": "enabled", "budget_tokens": 1024},
)

设置 config.yaml

- model_name: claude-3-7-sonnet-20250219
  litellm_params:
    model: vertex_ai/claude-3-7-sonnet-20250219
    vertex_ai_project: "my-test-project"
    vertex_ai_location: "us-west-1"

启动代理

litellm --config /path/to/config.yaml

测试一下！

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "claude-3-7-sonnet-20250219",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "thinking": {"type": "enabled", "budget_tokens": 1024}
  }'

预期响应

ModelResponse(
    id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
    created=1740470510,
    model='claude-3-7-sonnet-20250219',
    object='chat.completion',
    system_fingerprint=None,
    choices=[
        Choices(
            finish_reason='stop',
            index=0,
            message=Message(
                content="The capital of France is Paris.",
                role='assistant',
                tool_calls=None,
                function_call=None,
                provider_specific_fields={
                    'citations': None,
                    'thinking_blocks': [
                        {
                            'type': 'thinking',
                            'thinking': 'The capital of France is Paris. This is a very straightforward factual question.',
                            'signature': 'EuYBCkQYAiJAy6...'
                        }
                    ]
                }
            ),
            thinking_blocks=[
                {
                    'type': 'thinking',
                    'thinking': 'The capital of France is Paris. This is a very straightforward factual question.',
                    'signature': 'EuYBCkQYAiJAy6AGB...'
                }
            ],
            reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
        )
    ],
    usage=Usage(
        completion_tokens=68,
        prompt_tokens=42,
        total_tokens=110,
        completion_tokens_details=None,
        prompt_tokens_details=PromptTokensDetailsWrapper(
            audio_tokens=None,
            cached_tokens=0,
            text_tokens=None,
            image_tokens=None
        ),
        cache_creation_input_tokens=0,
        cache_read_input_tokens=0
    )
)

Meta/Llama API

模型名称	函数调用
meta/llama-3.2-90b-vision-instruct-maas	`completion('vertex_ai/meta/llama-3.2-90b-vision-instruct-maas', messages)`
meta/llama3-8b-instruct-maas	`completion('vertex_ai/meta/llama3-8b-instruct-maas', messages)`
meta/llama3-70b-instruct-maas	`completion('vertex_ai/meta/llama3-70b-instruct-maas', messages)`
meta/llama3-405b-instruct-maas	`completion('vertex_ai/meta/llama3-405b-instruct-maas', messages)`
meta/llama-4-scout-17b-16e-instruct-maas	`completion('vertex_ai/meta/llama-4-scout-17b-16e-instruct-maas', messages)`
meta/llama-4-scout-17-128e-instruct-maas	`completion('vertex_ai/meta/llama-4-scout-128b-16e-instruct-maas', messages)`
meta/llama-4-maverick-17b-128e-instruct-maas	`completion('vertex_ai/meta/llama-4-maverick-17b-128e-instruct-maas',messages)`
meta/llama-4-maverick-17b-16e-instruct-maas	`completion('vertex_ai/meta/llama-4-maverick-17b-16e-instruct-maas',messages)`

用法

SDK
代理

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "meta/llama3-405b-instruct-maas"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
    model="vertex_ai/" + model,
    messages=[{"role": "user", "content": "hi"}],
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

1. 添加到 config

model_list:
    - model_name: anthropic-llama
      litellm_params:
        model: vertex_ai/meta/llama3-405b-instruct-maas
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: anthropic-llama
      litellm_params:
        model: vertex_ai/meta/llama3-405b-instruct-maas
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. 测试一下！

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "anthropic-llama", # 👈 the 'model_name' in config
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

Mistral API

支持的 OpenAI 参数

模型名称	函数调用
mistral-large@latest	`completion('vertex_ai/mistral-large@latest', messages)`
mistral-large@2407	`completion('vertex_ai/mistral-large@2407', messages)`
mistral-nemo@latest	`completion('vertex_ai/mistral-nemo@latest', messages)`
codestral@latest	`completion('vertex_ai/codestral@latest', messages)`
codestral@@2405	`completion('vertex_ai/codestral@2405', messages)`

用法

SDK
代理

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "mistral-large@2407"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
    model="vertex_ai/" + model,
    messages=[{"role": "user", "content": "hi"}],
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

1. 添加到 config

model_list:
    - model_name: vertex-mistral
      litellm_params:
        model: vertex_ai/mistral-large@2407
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: vertex-mistral
      litellm_params:
        model: vertex_ai/mistral-large@2407
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. 测试一下！

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "vertex-mistral", # 👈 the 'model_name' in config
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

用法 - Codestral FIM

通过 OpenAI /v1/completion 端点在 VertexAI 上调用 Codestral，用于 FIM 任务。

注意：您也可以通过 /chat/completion 调用 Codestral。

SDK
代理

from litellm import completion
import os

# os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""
# OR run `!gcloud auth print-access-token` in your terminal

model = "codestral@2405"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = text_completion(
    model="vertex_ai/" + model,
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
    prompt="def is_odd(n): \n return n % 2 == 1 \ndef test_is_odd():", 
    suffix="return True",                                              # optional
    temperature=0,                                                     # optional
    top_p=1,                                                           # optional
    max_tokens=10,                                                     # optional
    min_tokens=10,                                                     # optional
    seed=10,                                                           # optional
    stop=["return"],                                                   # optional
)

print("\nModel Response", response)

1. 添加到 config

model_list:
    - model_name: vertex-codestral
      litellm_params:
        model: vertex_ai/codestral@2405
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: vertex-codestral
      litellm_params:
        model: vertex_ai/codestral@2405
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. 测试一下！

curl -X POST 'http://0.0.0.0:4000/completions' \
      -H 'Authorization: Bearer sk-1234' \
      -H 'Content-Type: application/json' \
      -d '{
            "model": "vertex-codestral", # 👈 the 'model_name' in config
            "prompt": "def is_odd(n): \n return n % 2 == 1 \ndef test_is_odd():", 
            "suffix":"return True",                                              # optional
            "temperature":0,                                                     # optional
            "top_p":1,                                                           # optional
            "max_tokens":10,                                                     # optional
            "min_tokens":10,                                                     # optional
            "seed":10,                                                           # optional
            "stop":["return"],                                                   # optional
        }'

AI21 模型

模型名称	函数调用
jamba-1.5-mini@001	`completion(model='vertex_ai/jamba-1.5-mini@001', messages)`
jamba-1.5-large@001	`completion(model='vertex_ai/jamba-1.5-large@001', messages)`

用法

SDK
代理

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "meta/jamba-1.5-mini@001"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
    model="vertex_ai/" + model,
    messages=[{"role": "user", "content": "hi"}],
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

1. 添加到 config

model_list:
    - model_name: jamba-1.5-mini
      litellm_params:
        model: vertex_ai/jamba-1.5-mini@001
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: jamba-1.5-large
      litellm_params:
        model: vertex_ai/jamba-1.5-large@001
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. 测试一下！

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "jamba-1.5-large",
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

Gemini Pro

模型名称	函数调用
gemini-pro	`completion('gemini-pro', messages)`, `completion('vertex_ai/gemini-pro', messages)`

微调模型

您可以通过 LiteLLM 调用微调的 Vertex AI Gemini 模型

属性	详情
提供商路由	`vertex_ai/gemini/{MODEL_ID}`
Vertex 文档	Vertex AI - 微调的 Gemini 模型
支持的操作	`/chat/completions`, `/completions`, `/embeddings`, `/images`

要使用遵循 /gemini 请求/响应格式的模型，只需将 model 参数设置为

调用微调 Gemini 模型的 model 参数
model="vertex_ai/gemini/<your-finetuned-model>"

LiteLLM Python SDK
LiteLLM 代理

示例
import litellm
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = litellm.completion(
  model="vertex_ai/gemini/<your-finetuned-model>",  # e.g. vertex_ai/gemini/4965075652664360960
  messages=[{ "content": "Hello, how are you?","role": "user"}],
)

将 Vertex 凭据添加到您的环境变量

验证到 Vertex AI

!gcloud auth application-default login

设置 config.yaml

添加到 litellm config
- model_name: finetuned-gemini
  litellm_params:
    model: vertex_ai/gemini/<ENDPOINT_ID>
    vertex_project: <PROJECT_ID>
    vertex_location: <LOCATION>

测试一下！

OpenAI Python SDK
curl

示例请求
from openai import OpenAI

client = OpenAI(
    api_key="your-litellm-key",
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="finetuned-gemini",
    messages=[
        {"role": "user", "content": "hi"}
    ]
)
print(response)

示例请求
curl --location 'https://0.0.0.0:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: <LITELLM_KEY>' \
--data '{"model": "finetuned-gemini" ,"messages":[{"role": "user", "content":[{"type": "text", "text": "hi"}]}]}'

Model Garden

提示

支持 Vertex Model Garden 中的所有 OpenAI 兼容模型。

使用 Model Garden

几乎所有 Vertex Model Garden 模型都与 OpenAI 兼容。

OpenAI 兼容模型
非 OpenAI 兼容模型

属性	详情
提供商路由	`vertex_ai/openai/{MODEL_ID}`
Vertex 文档	Vertex Model Garden - OpenAI 对话补全, Vertex Model Garden
支持的操作	`/chat/completions`, `/embeddings`

SDK
代理

from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
  model="vertex_ai/openai/<your-endpoint-id>", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)

1. 添加到 config

model_list:
    - model_name: llama3-1-8b-instruct
      litellm_params:
        model: vertex_ai/openai/5464397967697903616
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"

2. 启动代理

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. 测试一下！

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "llama3-1-8b-instruct", # 👈 the 'model_name' in config
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
  model="vertex_ai/<your-endpoint-id>", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Gemini Pro Vision

模型名称	函数调用
gemini-pro-vision	`completion('gemini-pro-vision', messages)`, `completion('vertex_ai/gemini-pro-vision', messages)`

Gemini 1.5 Pro（及 Vision）

模型名称	函数调用
gemini-1.5-pro	`completion('gemini-1.5-pro', messages)`, `completion('vertex_ai/gemini-1.5-pro', messages)`
gemini-1.5-flash-preview-0514	`completion('gemini-1.5-flash-preview-0514', messages)`, `completion('vertex_ai/gemini-1.5-flash-preview-0514', messages)`
gemini-1.5-pro-preview-0514	`completion('gemini-1.5-pro-preview-0514', messages)`, `completion('vertex_ai/gemini-1.5-pro-preview-0514', messages)`

使用 Gemini Pro Vision

以与 OpenAI gpt-4-vision 相同的输入/输出格式调用 gemini-pro-vision

LiteLLM 支持在 url 中传递以下图像类型

具有 Cloud Storage URI 的图像 - gs://cloud-samples-data/generative-ai/image/boats.jpeg
具有直接链接的图像 - https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg
具有 Cloud Storage URI 的视频 - https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4
Base64 编码的本地图像

示例请求 - 图像 URL

具有直接链接的图像
本地 Base64 图像

import litellm

response = litellm.completion(
  model = "vertex_ai/gemini-pro-vision",
  messages=[
      {
          "role": "user",
          "content": [
                          {
                              "type": "text",
                              "text": "Whats in this image?"
                          },
                          {
                              "type": "image_url",
                              "image_url": {
                              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                              }
                          }
                      ]
      }
  ],
)
print(response)

import litellm

def encode_image(image_path):
    import base64

    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

image_path = "cached_logo.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
response = litellm.completion(
    model="vertex_ai/gemini-pro-vision",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Whats in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "data:image/jpeg;base64," + base64_image
                    },
                },
            ],
        }
    ],
)
print(response)

用法 - 函数调用

LiteLLM 支持 Vertex AI Gemini 模型的函数调用。

from litellm import completion
import os
# set env
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ".."
os.environ["VERTEX_AI_PROJECT"] = ".."
os.environ["VERTEX_AI_LOCATION"] = ".."

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]

response = completion(
    model="vertex_ai/gemini-pro-vision",
    messages=messages,
    tools=tools,
)
# Add any assertions, here to check response args
print(response)
assert isinstance(response.choices[0].message.tool_calls[0].function.name, str)
assert isinstance(
    response.choices[0].message.tool_calls[0].function.arguments, str
)

用法 - PDF / 视频 / 音频等文件

通过 LiteLLM 传递 Vertex AI 支持的任何文件。

LiteLLM 支持在 URL 中传递以下文件类型。

从 v1.65.1+ 版本开始，VertexAI 支持使用 file 消息类型。

Files with Cloud Storage URIs - gs://cloud-samples-data/generative-ai/image/boats.jpeg
Files with direct links - https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg
Videos with Cloud Storage URIs - https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4
Base64 Encoded Local Files

SDK
代理

使用 `gs://` 或任何 URL

from litellm import completion

response = completion(
    model="vertex_ai/gemini-1.5-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "You are a very professional document summarization specialist. Please summarize the given document."},
                {
                    "type": "file",
                    "file": {
                        "file_id": "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
                        "format": "application/pdf" # OPTIONAL - specify mime-type
                    }
                },
            ],
        }
    ],
    max_tokens=300,
)

print(response.choices[0])

使用 base64

from litellm import completion
import base64
import requests

# URL of the file
url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf"

# Download the file
response = requests.get(url)
file_data = response.content

encoded_file = base64.b64encode(file_data).decode("utf-8")

response = completion(
    model="vertex_ai/gemini-1.5-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "You are a very professional document summarization specialist. Please summarize the given document."},
                {
                    "type": "file",
                    "file": {
                        "file_data": f"data:application/pdf;base64,{encoded_file}", # 👈 PDF
                    }  
                },
                {
                    "type": "audio_input",
                    "audio_input {
                        "audio_input": f"data:audio/mp3;base64,{encoded_file}", # 👈 AUDIO File ('file' message works as too)
                    }  
                },
            ],
        }
    ],
    max_tokens=300,
)

print(response.choices[0])

将模型添加到 config

- model_name: gemini-1.5-flash
  litellm_params:
    model: vertex_ai/gemini-1.5-flash
    vertex_credentials: "/path/to/service_account.json"

启动代理

litellm --config /path/to/config.yaml

测试一下！

使用 gs://

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-1.5-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "You are a very professional document summarization specialist. Please summarize the given document"
          },
          {
                "type": "file",
                "file": {
                    "file_id": "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
                    "format": "application/pdf" # OPTIONAL
                }
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-1.5-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "You are a very professional document summarization specialist. Please summarize the given document"
          },
          {
                "type": "file",
                "file": {
                    "file_data": f"data:application/pdf;base64,{encoded_file}", # 👈 PDF
                },
            },
            {
                "type": "audio_input",
                "audio_input {
                    "audio_input": f"data:audio/mp3;base64,{encoded_file}", # 👈 AUDIO File ('file' message works as too)
                }  
            },
    ]
      }
    ],
    "max_tokens": 300
  }'

聊天模型

模型名称	函数调用
chat-bison-32k	`completion('chat-bison-32k', messages)`
chat-bison	`completion('chat-bison', messages)`
chat-bison@001	`completion('chat-bison@001', messages)`

代码聊天模型

模型名称	函数调用
codechat-bison	`completion('codechat-bison', messages)`
codechat-bison-32k	`completion('codechat-bison-32k', messages)`
codechat-bison@001	`completion('codechat-bison@001', messages)`

文本模型

模型名称	函数调用
text-bison	`completion('text-bison', messages)`
text-bison@001	`completion('text-bison@001', messages)`

代码文本模型

模型名称	函数调用
code-bison	`completion('code-bison', messages)`
code-bison@001	`completion('code-bison@001', messages)`
code-gecko@001	`completion('code-gecko@001', messages)`
code-gecko@latest	`completion('code-gecko@latest', messages)`

Embedding 模型

用法 - Embedding

SDK
LiteLLM 代理

import litellm
from litellm import embedding
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1"  # proj location

response = embedding(
    model="vertex_ai/textembedding-gecko",
    input=["good morning from litellm"],
)
print(response)

将模型添加到 config.yaml

model_list:
  - model_name: snowflake-arctic-embed-m-long-1731622468876
    litellm_params:
      model: vertex_ai/<your-model-id>
      vertex_project: "adroit-crow-413218"
      vertex_location: "us-central1"
      vertex_credentials: adroit-crow-413218-a956eef1a2a8.json 

litellm_settings:
  drop_params: True

启动代理

$ litellm --config /path/to/config.yaml

使用 OpenAI Python SDK、Langchain Python SDK 发送请求

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

response = client.embeddings.create(
    model="snowflake-arctic-embed-m-long-1731622468876", 
    input = ["good morning from litellm", "this is another item"],
)

print(response)

支持的 Embedding 模型

支持此处列出的所有模型

模型名称	函数调用
text-embedding-004	`embedding(model="vertex_ai/text-embedding-004", input)`
text-multilingual-embedding-002	`embedding(model="vertex_ai/text-multilingual-embedding-002", input)`
textembedding-gecko	`embedding(model="vertex_ai/textembedding-gecko", input)`
textembedding-gecko-multilingual	`embedding(model="vertex_ai/textembedding-gecko-multilingual", input)`
textembedding-gecko-multilingual@001	`embedding(model="vertex_ai/textembedding-gecko-multilingual@001", input)`
textembedding-gecko@001	`embedding(model="vertex_ai/textembedding-gecko@001", input)`
textembedding-gecko@003	`embedding(model="vertex_ai/textembedding-gecko@003", input)`
text-embedding-preview-0409	`embedding(model="vertex_ai/text-embedding-preview-0409", input)`
text-multilingual-embedding-preview-0409	`embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input)`
微调或自定义 Embedding 模型	`embedding(model="vertex_ai/<your-model-id>", input)`

支持的 OpenAI (统一) 参数

参数	类型	Vertex 等效项
`input`	string 或 List[string]	`instances`
`dimensions`	int	`output_dimensionality`
`input_type`	Literal["RETRIEVAL_QUERY","RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", "FACT_VERIFICATION"]	`task_type`

使用 OpenAI (统一) 参数的用法

SDK
LiteLLM 代理

response = litellm.embedding(
    model="vertex_ai/text-embedding-004",
    input=["good morning from litellm", "gm"]
    input_type = "RETRIEVAL_DOCUMENT",
    dimensions=1,
)

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

response = client.embeddings.create(
    model="text-embedding-004", 
    input = ["good morning from litellm", "gm"],
    dimensions=1,
    extra_body = {
        "input_type": "RETRIEVAL_QUERY",
    }
)

print(response)

支持的 Vertex 特有参数

参数	类型
`auto_truncate`	bool
`task_type`	Literal["RETRIEVAL_QUERY","RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", "FACT_VERIFICATION"]
`title`	str

使用 Vertex 特有参数（使用 `task_type` 和 `title`）

您可以将任何 Vertex 特有参数传递给 embedding 模型。只需像这样将它们传递给 embedding 函数

包含所有 embedding 参数的相关 Vertex AI 文档

SDK
LiteLLM 代理

response = litellm.embedding(
    model="vertex_ai/text-embedding-004",
    input=["good morning from litellm", "gm"]
    task_type = "RETRIEVAL_DOCUMENT",
    title = "test",
    dimensions=1,
    auto_truncate=True,
)

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

response = client.embeddings.create(
    model="text-embedding-004", 
    input = ["good morning from litellm", "gm"],
    dimensions=1,
    extra_body = {
        "task_type": "RETRIEVAL_QUERY",
        "auto_truncate": True,
        "title": "test",
    }
)

print(response)

已知限制

每个请求仅支持 1 个图像 / 视频
仅支持 GCS 或 base64 编码的图像 / 视频

用法

SDK
LiteLLM 代理 (统一端点)
LiteLLM 代理 (Vertex SDK)

使用 GCS 图像

response = await litellm.aembedding(
    model="vertex_ai/multimodalembedding@001",
    input="gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png" # will be sent as a gcs image
)

使用 Base64 编码的图像

response = await litellm.aembedding(
    model="vertex_ai/multimodalembedding@001",
    input="data:image/jpeg;base64,..." # will be sent as a base64 encoded image
)

将模型添加到 config.yaml

model_list:
  - model_name: multimodalembedding@001
    litellm_params:
      model: vertex_ai/multimodalembedding@001
      vertex_project: "adroit-crow-413218"
      vertex_location: "us-central1"
      vertex_credentials: adroit-crow-413218-a956eef1a2a8.json 

litellm_settings:
  drop_params: True

启动代理

$ litellm --config /path/to/config.yaml

使用 OpenAI Python SDK、Langchain Python SDK 发送请求

OpenAI SDK
Langchain

包含 GCS 图像 / 视频 URI 的请求

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
    model="multimodalembedding@001", 
    input = "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png",
)

print(response)

包含 Base64 编码图像的请求

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
    model="multimodalembedding@001", 
    input = "data:image/jpeg;base64,...",
)

print(response)

包含 GCS 图像 / 视频 URI 的请求

from langchain_openai import OpenAIEmbeddings

embeddings_models = "multimodalembedding@001"

embeddings = OpenAIEmbeddings(
    model="multimodalembedding@001",
    base_url="http://0.0.0.0:4000",
    api_key="sk-1234",  # type: ignore
)


query_result = embeddings.embed_query(
    "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
)
print(query_result)

包含 Base64 编码图像的请求

from langchain_openai import OpenAIEmbeddings

embeddings_models = "multimodalembedding@001"

embeddings = OpenAIEmbeddings(
    model="multimodalembedding@001",
    base_url="http://0.0.0.0:4000",
    api_key="sk-1234",  # type: ignore
)


query_result = embeddings.embed_query(
    "data:image/jpeg;base64,..."
)
print(query_result)

将模型添加到 config.yaml

default_vertex_config:
  vertex_project: "adroit-crow-413218"
  vertex_location: "us-central1"
  vertex_credentials: adroit-crow-413218-a956eef1a2a8.json 

启动代理

$ litellm --config /path/to/config.yaml

使用 OpenAI Python SDK 发送请求

import vertexai

from vertexai.vision_models import Image, MultiModalEmbeddingModel, Video
from vertexai.vision_models import VideoSegmentConfig
from google.auth.credentials import Credentials


LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime

class CredentialsWrapper(Credentials):
    def __init__(self, token=None):
        super().__init__()
        self.token = token
        self.expiry = None  # or set to a future date if needed
        
    def refresh(self, request):
        pass
    
    def apply(self, headers, token=None):
        headers['Authorization'] = f'Bearer {self.token}'

    @property
    def expired(self):
        return False  # Always consider the token as non-expired

    @property
    def valid(self):
        return True  # Always consider the credentials as valid

credentials = CredentialsWrapper(token=LITELLM_PROXY_API_KEY)

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    credentials = credentials,
    api_transport="rest",
   
)

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding")
image = Image.load_from_file(
    "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
)

embeddings = model.get_embeddings(
    image=image,
    contextual_text="Colosseum",
    dimension=1408,
)
print(f"Image Embedding: {embeddings.image_embedding}")
print(f"Text Embedding: {embeddings.text_embedding}")

文本 + 图像 + 视频 Embedding

SDK
LiteLLM 代理 (统一端点)

文本 + 图像

response = await litellm.aembedding(
    model="vertex_ai/multimodalembedding@001",
    input=["hey", "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"] # will be sent as a gcs image
)

文本 + 视频

response = await litellm.aembedding(
    model="vertex_ai/multimodalembedding@001",
    input=["hey", "gs://my-bucket/embeddings/supermarket-video.mp4"] # will be sent as a gcs image
)

图像 + 视频

response = await litellm.aembedding(
    model="vertex_ai/multimodalembedding@001",
    input=["gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png", "gs://my-bucket/embeddings/supermarket-video.mp4"] # will be sent as a gcs image
)

将模型添加到 config.yaml

model_list:
  - model_name: multimodalembedding@001
    litellm_params:
      model: vertex_ai/multimodalembedding@001
      vertex_project: "adroit-crow-413218"
      vertex_location: "us-central1"
      vertex_credentials: adroit-crow-413218-a956eef1a2a8.json 

litellm_settings:
  drop_params: True

启动代理

$ litellm --config /path/to/config.yaml

使用 OpenAI Python SDK、Langchain Python SDK 发送请求

文本 + 图像

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
    model="multimodalembedding@001", 
    input = ["hey", "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"],
)

print(response)

文本 + 视频

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
    model="multimodalembedding@001", 
    input = ["hey", "gs://my-bucket/embeddings/supermarket-video.mp4"],
)

print(response)

图像 + 视频

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
    model="multimodalembedding@001", 
    input = ["gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png", "gs://my-bucket/embeddings/supermarket-video.mp4"],
)

print(response)

图像生成模型

用法

response = await litellm.aimage_generation(
    prompt="An olympic size swimming pool",
    model="vertex_ai/imagegeneration@006",
    vertex_ai_project="adroit-crow-413218",
    vertex_ai_location="us-central1",
)

生成多个图像

使用 n 参数传递要生成的图像数量

response = await litellm.aimage_generation(
    prompt="An olympic size swimming pool",
    model="vertex_ai/imagegeneration@006",
    vertex_ai_project="adroit-crow-413218",
    vertex_ai_location="us-central1",
    n=1,
)

支持的图像生成模型

模型名称	用法
`imagen-3.0-generate-001`	`litellm.image_generation('vertex_ai/imagen-3.0-generate-001', prompt)`
`imagen-3.0-fast-generate-001`	`litellm.image_generation('vertex_ai/imagen-3.0-fast-generate-001', prompt)`
`imagegeneration@006`	`litellm.image_generation('vertex_ai/imagegeneration@006', prompt)`
`imagegeneration@005`	`litellm.image_generation('vertex_ai/imagegeneration@005', prompt)`
`imagegeneration@002`	`litellm.image_generation('vertex_ai/imagegeneration@002', prompt)`

文本转语音 API

信息

LiteLLM 支持以 OpenAI 文本转语音 API 格式调用 Vertex AI 文本转语音 API

用法 - 基本

SDK
LiteLLM 代理 (统一端点)

Vertex AI 不支持传递 model 参数 - 因此传递 model=vertex_ai/ 是唯一必需的参数

同步用法

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"
response = litellm.speech(
    model="vertex_ai/",
    input="hello what llm guardrail do you have",
)
response.stream_to_file(speech_file_path)

异步用法

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"
response = litellm.aspeech(
    model="vertex_ai/",
    input="hello what llm guardrail do you have",
)
response.stream_to_file(speech_file_path)

将模型添加到 config.yaml

model_list:
  - model_name: vertex-tts
    litellm_params:
      model: vertex_ai/ # Vertex AI does not support passing a `model` param - so passing `model=vertex_ai/` is the only required param
      vertex_project: "adroit-crow-413218"
      vertex_location: "us-central1"
      vertex_credentials: adroit-crow-413218-a956eef1a2a8.json 

litellm_settings:
  drop_params: True

启动代理

$ litellm --config /path/to/config.yaml

使用 OpenAI Python SDK 发送请求

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# see supported values for "voice" on vertex here: 
# https://console.cloud.google.com/vertex-ai/generative/speech/text-to-speech
response = client.audio.speech.create(
    model = "vertex-tts",
    input="the quick brown fox jumped over the lazy dogs",
    voice={'languageCode': 'en-US', 'name': 'en-US-Studio-O'}
)
print("response from proxy", response)

用法 - 将 `ssml` 作为输入

将您的 ssml 作为输入传递给 input 参数，如果它包含 <speak>，它将被自动检测并作为 ssml 传递给 Vertex AI API

如果您需要强制将 input 作为 ssml 传递，请设置 use_ssml=True

SDK
LiteLLM 代理 (统一端点)

Vertex AI 不支持传递 model 参数 - 因此传递 model=vertex_ai/ 是唯一必需的参数

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"


ssml = """
<speak>
    <p>Hello, world!</p>
    <p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

response = litellm.speech(
    input=ssml,
    model="vertex_ai/test",
    voice={
        "languageCode": "en-UK",
        "name": "en-UK-Studio-O",
    },
    audioConfig={
        "audioEncoding": "LINEAR22",
        "speakingRate": "10",
    },
)
response.stream_to_file(speech_file_path)

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

ssml = """
<speak>
    <p>Hello, world!</p>
    <p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

# see supported values for "voice" on vertex here: 
# https://console.cloud.google.com/vertex-ai/generative/speech/text-to-speech
response = client.audio.speech.create(
    model = "vertex-tts",
    input=ssml,
    voice={'languageCode': 'en-US', 'name': 'en-US-Studio-O'},
)
print("response from proxy", response)

强制使用 SSML

您可以通过将 use_ssml 参数设置为 True 来强制使用 SSML。当您想确保输入被视为 SSML，即使它不包含 <speak> 标签时，这非常有用。

以下是强制使用 SSML 的示例

SDK
LiteLLM 代理 (统一端点)

Vertex AI 不支持传递 model 参数 - 因此传递 model=vertex_ai/ 是唯一必需的参数

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"


ssml = """
<speak>
    <p>Hello, world!</p>
    <p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

response = litellm.speech(
    input=ssml,
    use_ssml=True,
    model="vertex_ai/test",
    voice={
        "languageCode": "en-UK",
        "name": "en-UK-Studio-O",
    },
    audioConfig={
        "audioEncoding": "LINEAR22",
        "speakingRate": "10",
    },
)
response.stream_to_file(speech_file_path)

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

ssml = """
<speak>
    <p>Hello, world!</p>
    <p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

# see supported values for "voice" on vertex here: 
# https://console.cloud.google.com/vertex-ai/generative/speech/text-to-speech
response = client.audio.speech.create(
    model = "vertex-tts",
    input=ssml, # pass as None since OpenAI SDK requires this param
    voice={'languageCode': 'en-US', 'name': 'en-US-Studio-O'},
    extra_body={"use_ssml": True},
)
print("response from proxy", response)

批量 API

只需将以下 Vertex 环境变量添加到您的环境中。

# GCS Bucket settings, used to store batch prediction files in
export GCS_BUCKET_NAME = "litellm-testing-bucket" # the bucket you want to store batch prediction files in
export GCS_PATH_SERVICE_ACCOUNT="/path/to/service_account.json" # path to your service account json file

# Vertex /batch endpoint settings, used for LLM API requests
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json" # path to your service account json file
export VERTEXAI_LOCATION="us-central1" # can be any vertex location
export VERTEXAI_PROJECT="my-test-project" 

用法

1. 为 Vertex 创建一个批量请求文件

LiteLLM 要求文件遵循 OpenAI 批量文件格式

文件中的每个 body 都应该是一个 OpenAI API 请求

在当前工作目录中创建一个名为 vertex_batch_completions.jsonl 的文件，其中的 model 应为 Vertex AI 模型名称

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash-001", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 10}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash-001", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 10}}

2. 上传批量请求文件

对于 vertex_ai，liteLLM 会将文件上传到提供的 GCS_BUCKET_NAME

import os
oai_client = OpenAI(
    api_key="sk-1234",               # litellm proxy API key
    base_url="https://:4000" # litellm proxy base url
)
file_name = "vertex_batch_completions.jsonl" # 
_current_dir = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(_current_dir, file_name)
file_obj = oai_client.files.create(
    file=open(file_path, "rb"),
    purpose="batch",
    extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use vertex_ai for this file upload
)

预期响应

{
    "id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
    "bytes": 416,
    "created_at": 1733392026,
    "filename": "litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
    "object": "file",
    "purpose": "batch",
    "status": "uploaded",
    "status_details": null
}

3. 创建批量

batch_input_file_id = file_obj.id # use `file_obj` from step 2
create_batch_response = oai_client.batches.create(
    completion_window="24h",
    endpoint="/v1/chat/completions",
    input_file_id=batch_input_file_id, # example input_file_id = "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/c2b1b785-252b-448c-b180-033c4c63b3ce"
    extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use `vertex_ai` for this batch request
)

预期响应

{
    "id": "3814889423749775360",
    "completion_window": "24hrs",
    "created_at": 1733392026,
    "endpoint": "",
    "input_file_id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
    "object": "batch",
    "status": "validating",
    "cancelled_at": null,
    "cancelling_at": null,
    "completed_at": null,
    "error_file_id": null,
    "errors": null,
    "expired_at": null,
    "expires_at": null,
    "failed_at": null,
    "finalizing_at": null,
    "in_progress_at": null,
    "metadata": null,
    "output_file_id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001",
    "request_counts": null
}

4. 检索批量

retrieved_batch = oai_client.batches.retrieve(
    batch_id=create_batch_response.id,
    extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use `vertex_ai` for this batch request
)

预期响应

{
    "id": "3814889423749775360",
    "completion_window": "24hrs",
    "created_at": 1736500100,
    "endpoint": "",
    "input_file_id": "gs://example-bucket-1-litellm/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/7b2e47f5-3dd4-436d-920f-f9155bbdc952",
    "object": "batch",
    "status": "completed",
    "cancelled_at": null,
    "cancelling_at": null,
    "completed_at": null,
    "error_file_id": null,
    "errors": null,
    "expired_at": null,
    "expires_at": null,
    "failed_at": null,
    "finalizing_at": null,
    "in_progress_at": null,
    "metadata": null,
    "output_file_id": "gs://example-bucket-1-litellm/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001",
    "request_counts": null
}

微调 API

属性	详情
描述	使用 OpenAI Python SDK 在 Vertex AI 中创建微调任务（`/tuningJobs`）
Vertex 微调文档	Vertex 微调

用法

1. 将 `finetune_settings` 添加到 config.yaml

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/fake
      api_key: fake-key
      api_base: https://exampleopenaiendpoint-production.up.railway.app/

# 👇 Key change: For /fine_tuning/jobs endpoints
finetune_settings:
  - custom_llm_provider: "vertex_ai"
    vertex_project: "adroit-crow-413218"
    vertex_location: "us-central1"
    vertex_credentials: "/Users/ishaanjaffer/Downloads/adroit-crow-413218-a956eef1a2a8.json"

2. 创建微调任务

OpenAI Python SDK
curl

ft_job = await client.fine_tuning.jobs.create(
    model="gemini-1.0-pro-002",                  # Vertex model you want to fine-tune
    training_file="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",                 # file_id from create file response
    extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm proxy which provider to use
)

curl https://:4000/v1/fine_tuning/jobs \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer sk-1234" \
    -d '{
    "custom_llm_provider": "vertex_ai",
    "model": "gemini-1.0-pro-002",
    "training_file": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl"
    }'

高级用例 - 将 adapter_size 传递给 Vertex AI API

设置超参数，例如 n_epochs、learning_rate_multiplier 和 adapter_size。请参阅 Vertex 高级超参数

OpenAI Python SDK
curl

ft_job = client.fine_tuning.jobs.create(
    model="gemini-1.0-pro-002",                  # Vertex model you want to fine-tune
    training_file="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",                 # file_id from create file response
    hyperparameters={
        "n_epochs": 3,                      # epoch_count on Vertex
        "learning_rate_multiplier": 0.1,    # learning_rate_multiplier on Vertex
        "adapter_size": "ADAPTER_SIZE_ONE"  # type: ignore, vertex specific hyperparameter
    },
    extra_body={
        "custom_llm_provider": "vertex_ai",
    },
)

curl https://:4000/v1/fine_tuning/jobs \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer sk-1234" \
    -d '{
    "custom_llm_provider": "vertex_ai",
    "model": "gemini-1.0-pro-002",
    "training_file": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",
    "hyperparameters": {
        "n_epochs": 3,
        "learning_rate_multiplier": 0.1,
        "adapter_size": "ADAPTER_SIZE_ONE"
    }
    }'

补充

使用 `GOOGLE_APPLICATION_CREDENTIALS`

以下是将您的服务帐号凭据存储为 GOOGLE_APPLICATION_CREDENTIALS 环境变量的代码

import os 
import tempfile

def load_vertex_ai_credentials():
  # Define the path to the vertex_key.json file
  print("loading vertex ai credentials")
  filepath = os.path.dirname(os.path.abspath(__file__))
  vertex_key_path = filepath + "/vertex_key.json"

  # Read the existing content of the file or create an empty dictionary
  try:
      with open(vertex_key_path, "r") as file:
          # Read the file content
          print("Read vertexai file path")
          content = file.read()

          # If the file is empty or not valid JSON, create an empty dictionary
          if not content or not content.strip():
              service_account_key_data = {}
          else:
              # Attempt to load the existing JSON content
              file.seek(0)
              service_account_key_data = json.load(file)
  except FileNotFoundError:
      # If the file doesn't exist, create an empty dictionary
      service_account_key_data = {}

  # Create a temporary file
  with tempfile.NamedTemporaryFile(mode="w+", delete=False) as temp_file:
      # Write the updated content to the temporary file
      json.dump(service_account_key_data, temp_file, indent=2)

  # Export the temporary file as GOOGLE_APPLICATION_CREDENTIALS
  os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.abspath(temp_file.name)

使用 GCP 服务帐号

信息

尝试在 Google Cloud Run 上部署 LiteLLM？教程在此

找出绑定到 Google Cloud Run 服务的服务帐号

获取对应服务帐号的完整电子邮件地址
接下来，转到 IAM 与 Admin > 管理资源，选择包含您的 Google Cloud Run 服务的顶级项目

点击 Add Principal

指定服务帐号作为主体，Vertex AI 用户作为角色

完成后，当您在 Google Cloud Run 服务中部署新的容器时，LiteLLM 将自动访问所有 Vertex AI 端点。

特别感谢 @Darien Kindlund 提供此教程

VertexAI[Anthropic, Gemini, Model Garden]

概述​

vertex_ai/ 路由​

系统消息​

函数调用​

JSON Schema​

基础（Grounding）- 网页搜索​

从 Vertex AI SDK 迁移到 LiteLLM (GROUNDING)​

思考 / reasoning_content​

将 thinking 参数传递给 Gemini 模型​

上下文缓存​

先决条件​

示例用法​

LiteLLM 代理服务器用法​

身份验证 - vertex_project, vertex_location 等。​

动态参数​

环境变量​

指定安全设置​

按模型/请求设置​

全局设置​

设置 Vertex 项目和 Vertex 位置​

Anthropic​

用法​

用法 - thinking / reasoning_content​

Meta/Llama API​

用法​

Mistral API​

用法​

用法 - Codestral FIM​

AI21 模型​

用法​

Gemini Pro​

微调模型​

Model Garden​

使用 Model Garden​

Gemini Pro Vision​

Gemini 1.5 Pro（及 Vision）​

使用 Gemini Pro Vision​

用法 - 函数调用​

用法 - PDF / 视频 / 音频等文件​

使用 gs:// 或任何 URL​

使用 base64​

聊天模型​

代码聊天模型​

文本模型​

代码文本模型​

Embedding 模型​

用法 - Embedding​

支持的 Embedding 模型​

支持的 OpenAI (统一) 参数​

使用 OpenAI (统一) 参数的用法​

支持的 Vertex 特有参数​

使用 Vertex 特有参数（使用 task_type 和 title）​

多模态 Embedding​

用法​

文本 + 图像 + 视频 Embedding​

图像生成模型​

支持的图像生成模型​

文本转语音 API​

用法 - 基本​

用法 - 将 ssml 作为输入​

强制使用 SSML​

批量 API​

用法​

1. 为 Vertex 创建一个批量请求文件​

2. 上传批量请求文件​

3. 创建批量​

4. 检索批量​

微调 API​

用法​

1. 将 finetune_settings 添加到 config.yaml​

2. 创建微调任务​

补充​

使用 GOOGLE_APPLICATION_CREDENTIALS​

使用 GCP 服务帐号​

概述

`vertex_ai/` 路由

系统消息

函数调用

JSON Schema

基础（Grounding）- 网页搜索

从 Vertex AI SDK 迁移到 LiteLLM (GROUNDING)

思考 / `reasoning_content`

将 `thinking` 参数传递给 Gemini 模型

上下文缓存

先决条件

示例用法

LiteLLM 代理服务器用法

身份验证 - vertex_project, vertex_location 等。

动态参数

环境变量

指定安全设置

按模型/请求设置

全局设置

设置 Vertex 项目和 Vertex 位置

Anthropic

用法

用法 - `thinking` / `reasoning_content`

Meta/Llama API

用法

Mistral API

用法

用法 - Codestral FIM

AI21 模型

用法

Gemini Pro

微调模型

Model Garden

使用 Model Garden

Gemini Pro Vision

Gemini 1.5 Pro（及 Vision）

使用 Gemini Pro Vision

用法 - 函数调用

用法 - PDF / 视频 / 音频等文件

使用 `gs://` 或任何 URL

使用 base64

聊天模型

代码聊天模型

文本模型

代码文本模型

Embedding 模型

用法 - Embedding

支持的 Embedding 模型

支持的 OpenAI (统一) 参数

使用 OpenAI (统一) 参数的用法

支持的 Vertex 特有参数

使用 Vertex 特有参数（使用 `task_type` 和 `title`）

多模态 Embedding

用法

文本 + 图像 + 视频 Embedding

图像生成模型

支持的图像生成模型

文本转语音 API

用法 - 基本

用法 - 将 `ssml` 作为输入

强制使用 SSML

批量 API

用法

1. 为 Vertex 创建一个批量请求文件

2. 上传批量请求文件

3. 创建批量

4. 检索批量

微调 API

用法

1. 将 `finetune_settings` 添加到 config.yaml

2. 创建微调任务

补充

使用 `GOOGLE_APPLICATION_CREDENTIALS`

使用 GCP 服务帐号