缓存
有关 OpenAI/Anthropic 提示缓存,请访问 此处
缓存 LLM 响应。LiteLLM 的缓存系统存储并重用 LLM 响应,以节省成本并减少延迟。当您两次发出相同的请求时,将返回缓存的响应,而不是再次调用 LLM API。
支持的缓存
- 内存缓存
- 磁盘缓存
- Redis 缓存
- Qdrant 语义缓存
- Redis 语义缓存
- S3 Bucket 缓存
- GCS Bucket 缓存
快速入门
- redis 缓存
- Qdrant 语义缓存
- s3 缓存
- gcs 缓存
- redis 语义缓存
- 内存缓存
- 磁盘缓存
可以通过在 config.yaml 中添加 cache 键来启用缓存
步骤 1:将 cache 添加到 config.yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
- model_name: text-embedding-ada-002
litellm_params:
model: text-embedding-ada-002
litellm_settings:
set_verbose: True
cache: True # set cache responses to True, litellm defaults to using a redis cache
[可选] 步骤 1.5:添加 redis 命名空间、默认 ttl
命名空间
如果您想为您的键创建一些文件夹,可以设置一个命名空间,如下所示
litellm_settings:
cache: true
cache_params: # set cache params for redis
type: redis
namespace: "litellm.caching.caching"
键将存储为
litellm.caching.caching:<hash>
Redis 集群
- 在 config.yaml 中设置
- 在 .env 中设置
model_list:
- model_name: "*"
litellm_params:
model: "*"
litellm_settings:
cache: True
cache_params:
type: redis
redis_startup_nodes: [{ "host": "127.0.0.1", "port": "7001" }]
您可以通过在 .env 中设置 REDIS_CLUSTER_NODES 来配置 redis 集群
示例 REDIS_CLUSTER_NODES 值
REDIS_CLUSTER_NODES = "[{"host": "127.0.0.1", "port": "7001"}, {"host": "127.0.0.1", "port": "7003"}, {"host": "127.0.0.1", "port": "7004"}, {"host": "127.0.0.1", "port": "7005"}, {"host": "127.0.0.1", "port": "7006"}, {"host": "127.0.0.1", "port": "7007"}]"
示例 python 脚本,用于在 .env 中设置 redis 集群节点
# List of startup nodes
startup_nodes = [
{"host": "127.0.0.1", "port": "7001"},
{"host": "127.0.0.1", "port": "7003"},
{"host": "127.0.0.1", "port": "7004"},
{"host": "127.0.0.1", "port": "7005"},
{"host": "127.0.0.1", "port": "7006"},
{"host": "127.0.0.1", "port": "7007"},
]
# set startup nodes in environment variables
os.environ["REDIS_CLUSTER_NODES"] = json.dumps(startup_nodes)
print("REDIS_CLUSTER_NODES", os.environ["REDIS_CLUSTER_NODES"])
Redis Sentinel
- 在 config.yaml 中设置
- 在 .env 中设置
model_list:
- model_name: "*"
litellm_params:
model: "*"
litellm_settings:
cache: true
cache_params:
type: "redis"
service_name: "mymaster"
sentinel_nodes: [["localhost", 26379]]
sentinel_password: "password" # [OPTIONAL]
您可以通过在 .env 中设置 REDIS_SENTINEL_NODES 来配置 redis sentinel
示例 REDIS_SENTINEL_NODES 值
REDIS_SENTINEL_NODES='[["localhost", 26379]]'
REDIS_SERVICE_NAME = "mymaster"
REDIS_SENTINEL_PASSWORD = "password"
示例 python 脚本,用于在 .env 中设置 redis 集群节点
# List of startup nodes
sentinel_nodes = [["localhost", 26379]]
# set startup nodes in environment variables
os.environ["REDIS_SENTINEL_NODES"] = json.dumps(sentinel_nodes)
print("REDIS_SENTINEL_NODES", os.environ["REDIS_SENTINEL_NODES"])
TTL
litellm_settings:
cache: true
cache_params: # set cache params for redis
type: redis
ttl: 600 # will be cached on redis for 600s
# default_in_memory_ttl: Optional[float], default is None. time in seconds.
# default_in_redis_ttl: Optional[float], default is None. time in seconds.
SSL
只需在您的 .env 中设置 REDIS_SSL="True",LiteLLM 就会识别它。
REDIS_SSL="True"
为了快速测试,您也可以使用 REDIS_URL,例如
REDIS_URL="rediss://.."
但是我们不建议在生产环境中使用 REDIS_URL。我们注意到使用它与 redis_host、port 等之间存在性能差异。
GCP IAM 身份验证
对于具有 IAM 身份验证的 GCP Memorystore Redis,请安装所需的依赖项
pip install google-cloud-iam
- 在 config.yaml 中设置
- 在 .env 中设置
对于具有 GCP IAM 的 Redis 集群
litellm_settings:
cache: True
cache_params:
type: redis
redis_startup_nodes:
[{ "host": "10.128.0.2", "port": 6379 }, { "host": "10.128.0.2", "port": 11008 }]
gcp_service_account: "projects/-/serviceAccounts/your-sa@project.iam.gserviceaccount.com"
ssl: true
ssl_cert_reqs: null
ssl_check_hostname: false
您可以在 .env 中配置 GCP IAM Redis 身份验证
对于 Redis 集群
REDIS_CLUSTER_NODES='[{"host": "10.128.0.2", "port": 6379}, {"host": "10.128.0.2", "port": 11008}]'
REDIS_GCP_SERVICE_ACCOUNT="projects/-/serviceAccounts/your-sa@project.iam.gserviceaccount.com"
REDIS_GCP_SSL_CA_CERTS="./server-ca.pem"
REDIS_SSL="True"
REDIS_SSL_CERT_REQS="None"
REDIS_SSL_CHECK_HOSTNAME="False"
GCP 身份验证设置
确保您的 GCP 凭据已配置
# Option 1: Service account key file
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
# Option 2: If running on GCP compute instance with service account attached
# No additional setup needed
步骤 2:将 Redis 凭据添加到 .env
为了启用缓存,请在您的操作系统环境中设置 REDIS_URL 或 REDIS_HOST。
REDIS_URL = "" # REDIS_URL='redis://username:password@hostname:port/database'
## OR ##
REDIS_HOST = "" # REDIS_HOST='redis-18841.c274.us-east-1-3.ec2.cloud.redislabs.com'
REDIS_PORT = "" # REDIS_PORT='18841'
REDIS_PASSWORD = "" # REDIS_PASSWORD='liteLlmIsAmazing'
REDIS_USERNAME = "" # REDIS_USERNAME='my-redis-username' [OPTIONAL] if your redis server requires a username
REDIS_SSL = "True" # REDIS_SSL='True' to enable SSL by default is False
其他 kwargs
使用 REDIS_* 环境变量配置所有 Redis 客户端库参数。这是切换 Redis 设置的推荐机制,因为它会自动将环境变量映射到 Redis 客户端 kwargs。
您可以将任何其他 redis.Redis 参数存储在您的操作系统环境中,如下所示
REDIS_<redis-kwarg-name> = ""
例如
REDIS_SSL = "True"
REDIS_SSL_CERT_REQS = "None"
REDIS_CONNECTION_POOL_KWARGS = '{"max_connections": 20}'
注意:对于非字符串 Redis 参数(如整数、布尔值或复杂对象),请避免使用 REDIS_* 环境变量,因为它们可能在 Redis 客户端初始化期间失败。相反,请在路由器的配置中使用 cache_kwargs 来设置此类参数。
步骤 3:使用配置运行代理
$ litellm --config /path/to/config.yaml
可以通过在 config.yaml 中添加 cache 键来启用缓存
步骤 1:将 cache 添加到 config.yaml
model_list:
- model_name: fake-openai-endpoint
litellm_params:
model: openai/fake
api_key: fake-key
api_base: https://exampleopenaiendpoint-production.up.railway.app/
- model_name: openai-embedding
litellm_params:
model: openai/text-embedding-3-small
api_key: os.environ/OPENAI_API_KEY
litellm_settings:
set_verbose: True
cache: True # set cache responses to True, litellm defaults to using a redis cache
cache_params:
type: qdrant-semantic
qdrant_semantic_cache_embedding_model: openai-embedding # the model should be defined on the model_list
qdrant_collection_name: test_collection
qdrant_quantization_config: binary
similarity_threshold: 0.8 # similarity threshold for semantic cache
步骤 2:将 Qdrant 凭据添加到您的 .env
QDRANT_API_KEY = "16rJUMBRx*************"
QDRANT_API_BASE = "https://5392d382-45*********.cloud.qdrant.io"
步骤 3:使用配置运行代理
$ litellm --config /path/to/config.yaml
步骤 4. 测试它
curl -i https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "fake-openai-endpoint",
"messages": [
{"role": "user", "content": "Hello"}
]
}'
当语义缓存启用时,预计会在响应头中看到 x-litellm-semantic-similarity
步骤 1:将 cache 添加到 config.yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
- model_name: text-embedding-ada-002
litellm_params:
model: text-embedding-ada-002
litellm_settings:
set_verbose: True
cache: True # set cache responses to True
cache_params: # set cache params for s3
type: s3
s3_bucket_name: cache-bucket-litellm # AWS Bucket Name for S3
s3_region_name: us-west-2 # AWS Region Name for S3
s3_aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID # us os.environ/<variable name> to pass environment variables. This is AWS Access Key ID for S3
s3_aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY # AWS Secret Access Key for S3
s3_endpoint_url: https://s3.amazonaws.com # [OPTIONAL] S3 endpoint URL, if you want to use Backblaze/cloudflare s3 buckets
步骤 2:运行代理配置
$ litellm --config /path/to/config.yaml
步骤 1:将 cache 添加到 config.yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
- model_name: text-embedding-ada-002
litellm_params:
model: text-embedding-ada-002
litellm_settings:
set_verbose: True
cache: True # set cache responses to True
cache_params: # set cache params for gcs
type: gcs
gcs_bucket_name: cache-bucket-litellm # GCS Bucket Name for caching
gcs_path_service_account: os.environ/GCS_PATH_SERVICE_ACCOUNT # use os.environ/<variable name> to pass environment variables. This is the path to your GCS service account JSON file
gcs_path: cache/ # [OPTIONAL] GCS path prefix for cache objects
步骤 2:将 GCS 凭据添加到 .env
在您的 .env 文件中设置 GCS 环境变量
GCS_BUCKET_NAME="your-gcs-bucket-name"
GCS_PATH_SERVICE_ACCOUNT="/path/to/service-account.json"
步骤 3:使用配置运行代理
$ litellm --config /path/to/config.yaml
可以通过在 config.yaml 中添加 cache 键来启用缓存
步骤 1:将 cache 添加到 config.yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
- model_name: azure-embedding-model
litellm_params:
model: azure/azure-embedding-model
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
litellm_settings:
set_verbose: True
cache: True # set cache responses to True
cache_params:
type: "redis-semantic"
similarity_threshold: 0.8 # similarity threshold for semantic cache
redis_semantic_cache_embedding_model: azure-embedding-model # set this to a model_name set in model_list
步骤 2:将 Redis 凭据添加到 .env
为了启用缓存,请在您的操作系统环境中设置 REDIS_URL 或 REDIS_HOST。
REDIS_URL = "" # REDIS_URL='redis://username:password@hostname:port/database'
## OR ##
REDIS_HOST = "" # REDIS_HOST='redis-18841.c274.us-east-1-3.ec2.cloud.redislabs.com'
REDIS_PORT = "" # REDIS_PORT='18841'
REDIS_PASSWORD = "" # REDIS_PASSWORD='liteLlmIsAmazing'
其他 kwargs
您可以将任何其他 redis.Redis 参数存储在您的操作系统环境中,如下所示
REDIS_<redis-kwarg-name> = ""
步骤 3:使用配置运行代理
$ litellm --config /path/to/config.yaml
用法
基本
- /chat/completions
- /embeddings
两次发送相同的请求
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "write a poem about litellm!"}],
"temperature": 0.7
}'
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "write a poem about litellm!"}],
"temperature": 0.7
}'
两次发送相同的请求
curl --location 'http://0.0.0.0:4000/embeddings' \
--header 'Content-Type: application/json' \
--data ' {
"model": "text-embedding-ada-002",
"input": ["write a litellm poem"]
}'
curl --location 'http://0.0.0.0:4000/embeddings' \
--header 'Content-Type: application/json' \
--data ' {
"model": "text-embedding-ada-002",
"input": ["write a litellm poem"]
}'
动态缓存控制
| 参数 | 类型 | 描述 |
|---|---|---|
ttl | 可选(int) | 将缓存响应用户定义的时间量(以秒为单位) |
s-maxage | 可选(int) | 仅接受缓存响应,这些响应在用户定义的范围内(以秒为单位) |
no-cache | 可选(bool) | 不会将响应存储在缓存中。 |
no-store | 可选(bool) | 不会缓存响应 |
namespace | 可选(str) | 将在用户定义的命名空间下缓存响应 |
每个缓存参数都可以按请求基础进行控制。以下是每个参数的示例
ttl
设置缓存响应的时间长度(以秒为单位)。
- OpenAI Python SDK
- curl
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="http://0.0.0.0:4000"
)
chat_completion = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello"}],
model="gpt-3.5-turbo",
extra_body={
"cache": {
"ttl": 300 # Cache response for 5 minutes
}
}
)
curl https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"cache": {"ttl": 300},
"messages": [
{"role": "user", "content": "Hello"}
]
}'
s-maxage
仅接受在指定年龄(以秒为单位)内的缓存响应。
- OpenAI Python SDK
- curl
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="http://0.0.0.0:4000"
)
chat_completion = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello"}],
model="gpt-3.5-turbo",
extra_body={
"cache": {
"s-maxage": 600 # Only use cache if less than 10 minutes old
}
}
)
curl https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"cache": {"s-maxage": 600},
"messages": [
{"role": "user", "content": "Hello"}
]
}'
no-cache
强制获取新的响应,绕过缓存。
- OpenAI Python SDK
- curl
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="http://0.0.0.0:4000"
)
chat_completion = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello"}],
model="gpt-3.5-turbo",
extra_body={
"cache": {
"no-cache": True # Skip cache check, get fresh response
}
}
)
curl https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"cache": {"no-cache": true},
"messages": [
{"role": "user", "content": "Hello"}
]
}'
no-store
不会将响应存储在缓存中。
- OpenAI Python SDK
- curl
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="http://0.0.0.0:4000"
)
chat_completion = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello"}],
model="gpt-3.5-turbo",
extra_body={
"cache": {
"no-store": True # Don't cache this response
}
}
)
curl https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"cache": {"no-store": true},
"messages": [
{"role": "user", "content": "Hello"}
]
}'
namespace
将响应存储在特定的缓存命名空间下。
- OpenAI Python SDK
- curl
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="http://0.0.0.0:4000"
)
chat_completion = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello"}],
model="gpt-3.5-turbo",
extra_body={
"cache": {
"namespace": "my-custom-namespace" # Store in custom namespace
}
}
)
curl https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"cache": {"namespace": "my-custom-namespace"},
"messages": [
{"role": "user", "content": "Hello"}
]
}'
为代理设置缓存,但不在实际 llm api 调用上
如果您只想启用速率限制和跨多个实例的负载均衡等功能,请使用此选项。
通过设置 supported_call_types: [] 来禁用实际 api 调用的缓存。
litellm_settings:
cache: True
cache_params:
type: redis
supported_call_types: []
调试缓存 - /cache/ping
LiteLLM Proxy 暴露了一个 /cache/ping 端点,用于测试缓存是否按预期工作
用法
curl --location 'http://0.0.0.0:4000/cache/ping' -H "Authorization: Bearer sk-1234"
预期响应 - 当缓存健康时
{
"status": "healthy",
"cache_type": "redis",
"ping_response": true,
"set_cache_response": "success",
"litellm_cache_params": {
"supported_call_types": "['completion', 'acompletion', 'embedding', 'aembedding', 'atranscription', 'transcription']",
"type": "redis",
"namespace": "None"
},
"redis_cache_params": {
"redis_client": "Redis<ConnectionPool<Connection<host=redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com,port=16337,db=0>>>",
"redis_kwargs": "{'url': 'redis://:******@redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com:16337'}",
"async_redis_conn_pool": "BlockingConnectionPool<Connection<host=redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com,port=16337,db=0>>",
"redis_version": "7.2.0"
}
}
高级
控制缓存开启的调用类型 - (/chat/completion、/embeddings 等)
默认情况下,缓存对所有调用类型都已开启。您可以通过在 cache_params 中设置 supported_call_types 来控制缓存对哪些调用类型开启
缓存仅对 supported_call_types 中指定的调用类型开启
litellm_settings:
cache: True
cache_params:
type: redis
supported_call_types:
["acompletion", "atext_completion", "aembedding", "atranscription"]
# /chat/completions, /completions, /embeddings, /audio/transcriptions
在 config.yaml 上设置缓存参数
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
- model_name: text-embedding-ada-002
litellm_params:
model: text-embedding-ada-002
litellm_settings:
set_verbose: True
cache: True # set cache responses to True, litellm defaults to using a redis cache
cache_params: # cache_params are optional
type: "redis" # The type of cache to initialize. Can be "local", "redis", "s3", or "gcs". Defaults to "local".
host: "localhost" # The host address for the Redis cache. Required if type is "redis".
port: 6379 # The port number for the Redis cache. Required if type is "redis".
password: "your_password" # The password for the Redis cache. Required if type is "redis".
# Optional configurations
supported_call_types:
["acompletion", "atext_completion", "aembedding", "atranscription"]
# /chat/completions, /completions, /embeddings, /audio/transcriptions
删除缓存键 - /cache/delete
为了删除缓存键,请向 /cache/delete 发送请求,并提供您想要删除的 keys
示例
curl -X POST "http://0.0.0.0:4000/cache/delete" \
-H "Authorization: Bearer sk-1234" \
-d '{"keys": ["586bf3f3c1bf5aecb55bd9996494d3bbc69eb58397163add6d49537762a7548d", "key2"]}'
# {"status":"success"}
从响应中查看缓存键
您可以在响应头中查看 cache_key,在缓存命中时,缓存键会作为 x-litellm-cache-key 响应头发送
curl -i --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"user": "ishan",
"messages": [
{
"role": "user",
"content": "what is litellm"
}
],
}'
来自 litellm 代理的响应
date: Thu, 04 Apr 2024 17:37:21 GMT
content-type: application/json
x-litellm-cache-key: 586bf3f3c1bf5aecb55bd9996494d3bbc69eb58397163add6d49537762a7548d
{
"id": "chatcmpl-9ALJTzsBlXR9zTxPvzfFFtFbFtG6T",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "I'm sorr.."
"role": "assistant"
}
}
],
"created": 1712252235,
}
**默认关闭缓存 - 仅选择启用 **
- 为缓存设置
mode: default_off
model_list:
- model_name: fake-openai-endpoint
litellm_params:
model: openai/fake
api_key: fake-key
api_base: https://exampleopenaiendpoint-production.up.railway.app/
# default off mode
litellm_settings:
set_verbose: True
cache: True
cache_params:
mode: default_off # 👈 Key change cache is default_off
- 在缓存默认关闭时选择启用缓存
- OpenAI Python SDK
- curl
import os
from openai import OpenAI
client = OpenAI(api_key=<litellm-api-key>, base_url="http://0.0.0.0:4000")
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Say this is a test",
}
],
model="gpt-3.5-turbo",
extra_body = { # OpenAI python accepts extra args in extra_body
"cache": {"use-cache": True}
}
)
curl https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"cache": {"use-cache": True}
"messages": [
{"role": "user", "content": "Say this is a test"}
]
}'
Redis max_connections
您可以在 cache_params 中设置 max_connections 参数,用于 Redis。 这将直接传递给 Redis 客户端,并控制池中同时连接的最大数量。 如果您看到类似 No connection available 的错误,请尝试增加此值
litellm_settings:
cache: true
cache_params:
type: redis
max_connections: 100
代理 config.yaml 中支持的 cache_params
cache_params:
# ttl
ttl: Optional[float]
default_in_memory_ttl: Optional[float]
default_in_redis_ttl: Optional[float]
max_connections: Optional[Int]
# Type of cache (options: "local", "redis", "s3", "gcs")
type: s3
# List of litellm call types to cache for
# Options: "completion", "acompletion", "embedding", "aembedding"
supported_call_types:
["acompletion", "atext_completion", "aembedding", "atranscription"]
# /chat/completions, /completions, /embeddings, /audio/transcriptions
# Redis cache parameters
host: localhost # Redis server hostname or IP address
port: "6379" # Redis server port (as a string)
password: secret_password # Redis server password
namespace: Optional[str] = None,
# GCP IAM Authentication for Redis
gcp_service_account: "projects/-/serviceAccounts/your-sa@project.iam.gserviceaccount.com" # GCP service account for IAM authentication
gcp_ssl_ca_certs: "./server-ca.pem" # Path to SSL CA certificate file for GCP Memorystore Redis
ssl: true # Enable SSL for secure connections
ssl_cert_reqs: null # Set to null for self-signed certificates
ssl_check_hostname: false # Set to false for self-signed certificates
# S3 cache parameters
s3_bucket_name: your_s3_bucket_name # Name of the S3 bucket
s3_region_name: us-west-2 # AWS region of the S3 bucket
s3_api_version: 2006-03-01 # AWS S3 API version
s3_use_ssl: true # Use SSL for S3 connections (options: true, false)
s3_verify: true # SSL certificate verification for S3 connections (options: true, false)
s3_endpoint_url: https://s3.amazonaws.com # S3 endpoint URL
s3_aws_access_key_id: your_access_key # AWS Access Key ID for S3
s3_aws_secret_access_key: your_secret_key # AWS Secret Access Key for S3
s3_aws_session_token: your_session_token # AWS Session Token for temporary credentials
# GCS cache parameters
gcs_bucket_name: your_gcs_bucket_name # Name of the GCS bucket
gcs_path_service_account: /path/to/service-account.json # Path to GCS service account JSON file
gcs_path: cache/ # [OPTIONAL] GCS path prefix for cache objects
特定于提供商的可选参数缓存
默认情况下,LiteLLM 仅在缓存键中包含标准的 OpenAI 参数。 但是,某些提供商(例如 Vertex AI)使用会影响输出但未包含在标准缓存键生成中的其他参数。
启用特定于提供商的参数缓存
将此设置添加到您的 config.yaml 中,以在缓存键中包含特定于提供商的可选参数
litellm_settings:
cache: True
cache_params:
type: "redis"
enable_caching_on_provider_specific_optional_params: True # Include provider-specific params in cache keys
高级 - 用户 API 密钥缓存 TTL
配置内存缓存存储密钥对象的时间长度(防止数据库请求)
general_settings:
user_api_key_cache_ttl: <your-number> #time in seconds
默认情况下,此值设置为 60s。