跳到主要内容

使用音频模型

如何向 /chat/completions 端点发送/接收音频

模型的音频输出

创建对 prompt 的类人音频响应的示例

import os 
import base64
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key"

# openai call
completion = await litellm.acompletion(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[{"role": "user", "content": "Is a golden retriever a good family dog?"}],
)

wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
f.write(wav_bytes)

模型的音频输入

import base64
import requests

url = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"
response = requests.get(url)
response.raise_for_status()
wav_data = response.content
encoded_string = base64.b64encode(wav_data).decode("utf-8")

completion = litellm.completion(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this recording?"},
{
"type": "input_audio",
"input_audio": {"data": encoded_string, "format": "wav"},
},
],
},
],
)

print(completion.choices[0].message)

检查模型是否支持 audio_inputaudio_output

使用 litellm.supports_audio_output(model="") -> 如果模型能生成音频输出,则返回 True

使用 litellm.supports_audio_input(model="") -> 如果模型能接受音频输入,则返回 True

assert litellm.supports_audio_output(model="gpt-4o-audio-preview") == True
assert litellm.supports_audio_input(model="gpt-4o-audio-preview") == True

assert litellm.supports_audio_output(model="gpt-3.5-turbo") == False
assert litellm.supports_audio_input(model="gpt-3.5-turbo") == False

带音频的响应格式

以下是将音频输入发送到模型时,你可能从 /chat/completions 端点接收到的 message 的 JSON 数据结构示例。

{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"refusal": null,
"audio": {
"id": "audio_abc123",
"expires_at": 1729018505,
"data": "<bytes omitted>",
"transcript": "Yes, golden retrievers are known to be ..."
}
},
"finish_reason": "stop"
}
  • audio 如果请求了音频输出模态,此对象包含模型音频响应的数据
    • audio.id 音频响应的唯一标识符
    • audio.expires_at 此音频响应在服务器上不再可用于多轮对话的 Unix 时间戳(以秒为单位)。
    • audio.data 模型生成的 Base64 编码音频字节,格式在请求中指定。
    • audio.transcript 模型生成的音频转录文本。