PII、PHI 屏蔽 - Presidio

概述

属性	详情
描述	使用此 Guardrail (护栏) 来屏蔽 PII（个人身份信息）、PHI（受保护健康信息）以及其他敏感数据。
提供商	Microsoft Presidio
支持的实体类型	所有 Presidio 实体类型
支持的操作	`MASK` (屏蔽), `BLOCK` (阻止)
支持的模式	`pre_call`, `during_call`, `post_call`, `logging_only`

部署选项

对于此 Guardrail (护栏)，您需要部署 Presidio Analyzer 和 Presido Anonymizer 容器。

部署选项	详情
部署 Presidio Docker 容器	- Presidio Analyzer Docker 容器 - Presidio Anonymizer Docker 容器

快速开始

LiteLLM UI
Config.yaml

1. 创建 PII、PHI 屏蔽 Guardrail (护栏)

在 LiteLLM UI 上，导航至 Guardrails (护栏)。点击“添加 Guardrail (护栏)”。在此下拉菜单中选择“Presidio PII”并输入您的 Presidio Analyzer 和 Anonymizer 端点。

1.2 配置实体类型

现在选择您想要屏蔽的实体类型。在此查看支持的操作

在 guardrails 部分定义您的 Guardrail (护栏)

config.yaml
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: openai/gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

guardrails:
  - guardrail_name: "presidio-pii"
    litellm_params:
      guardrail: presidio  # supported values: "aporia", "bedrock", "lakera", "presidio"
      mode: "pre_call"

设置以下环境变量

设置环境变量
export PRESIDIO_ANALYZER_API_BASE="https://:5002"
export PRESIDIO_ANONYMIZER_API_BASE="https://:5001"

`mode` 的支持值

pre_call 在 LLM 调用**之前**运行，应用于**输入**
post_call 在 LLM 调用**之后**运行，应用于**输入和输出**
logging_only 在 LLM 调用**之后**运行，仅在记录到 Langfuse 等系统之前应用 PII 屏蔽。不应用于实际的 LLM API 请求/响应。

2. 启动 LiteLLM Gateway

启动 Gateway

litellm --config config.yaml --detailed_debug

3. 测试！

3.1 LiteLLM UI

在 LiteLLM UI 上，导航到 'Test Keys' 页面，选择您创建的 Guardrail (护栏)，并发送包含 PII 数据的以下消息。

PII 请求

My credit card is 4111-1111-1111-1111 and my email is test@example.com.

3.2 在代码中测试

为了对请求应用 Guardrail (护栏)，在请求正文中发送 guardrails=["presidio-pii"]。

Langchain、OpenAI SDK 使用示例

已屏蔽 PII 调用
无 PII 调用

预期这将屏蔽 Jane Doe，因为它属于 PII

已屏蔽 PII 请求
curl https://:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "Hello my name is Jane Doe"}
    ],
    "guardrails": ["presidio-pii"],
  }'

失败时的预期响应

包含已屏蔽 PII 的响应
{
 "id": "chatcmpl-A3qSC39K7imjGbZ8xCDacGJZBoTJQ",
 "choices": [
   {
     "finish_reason": "stop",
     "index": 0,
     "message": {
       "content": "Hello, <PERSON>! How can I assist you today?",
       "role": "assistant",
       "tool_calls": null,
       "function_call": null
     }
   }
 ],
 "created": 1725479980,
 "model": "gpt-3.5-turbo-2024-07-18",
 "object": "chat.completion",
 "system_fingerprint": "fp_5bd87c427a",
 "usage": {
   "completion_tokens": 13,
   "prompt_tokens": 14,
   "total_tokens": 27
 },
 "service_tier": null
}

无 PII 请求
curl https://:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "Hello good morning"}
    ],
    "guardrails": ["presidio-pii"],
  }'

追踪 Guardrail (护栏) 请求

一旦您的 Guardrail (护栏) 在生产环境中启用，您将能够追踪 LiteLLM 日志、Langfuse、Arize Phoenix 等所有 LiteLLM 日志集成中的 Guardrail (护栏) 请求。

LiteLLM UI

在 LiteLLM 日志页面上，您可以看到特定请求的 PII 内容已被屏蔽。您还可以查看 Guardrail (护栏) 的详细追踪信息。这使您能够监控被屏蔽的实体类型及其相应的置信度分数，以及 Guardrail (护栏) 执行的持续时间。

Langfuse

将 LiteLLM 连接到 Langfuse 后，您可以在 Langfuse Trace 中看到 Guardrail (护栏) 信息。

实体类型配置

您可以配置特定的实体类型进行 PII 检测，并决定如何处理每种实体类型（屏蔽或阻止）。

在 config.yaml 中配置实体类型

使用特定的实体类型配置定义您的 Guardrail (护栏)

带实体类型的 config.yaml
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: openai/gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

guardrails:
  - guardrail_name: "presidio-mask-guard"
    litellm_params:
      guardrail: presidio
      mode: "pre_call"
      pii_entities_config:
        CREDIT_CARD: "MASK"  # Will mask credit card numbers
        EMAIL_ADDRESS: "MASK"  # Will mask email addresses
        
  - guardrail_name: "presidio-block-guard"
    litellm_params:
      guardrail: presidio
      mode: "pre_call"
      pii_entities_config:
        CREDIT_CARD: "BLOCK"  # Will block requests containing credit card numbers

支持的实体类型

LiteLLM 支持所有 Presidio 实体类型。在此查看 Presidio 实体类型的完整列表这里。

支持的操作

对于每种实体类型，您可以指定以下操作之一

MASK (屏蔽): 用占位符替换实体（例如，<PERSON>）
BLOCK (阻止): 如果检测到此实体类型，则完全阻止请求

使用实体类型配置测试请求

屏蔽 PII 实体
阻止 PII 实体

使用屏蔽配置时，实体将被占位符替换

屏蔽 PII 请求
curl https://:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "My credit card is 4111-1111-1111-1111 and my email is test@example.com"}
    ],
    "guardrails": ["presidio-mask-guard"]
  }'

包含已屏蔽实体的示例响应

{
  "id": "chatcmpl-123abc",
  "choices": [
    {
      "message": {
        "content": "I can see you provided a <CREDIT_CARD> and an <EMAIL_ADDRESS>. For security reasons, I recommend not sharing this sensitive information.",
        "role": "assistant"
      },
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  // ... other response fields
}

使用阻止配置时，包含已配置实体类型的请求将被完全阻止，并抛出异常

阻止 PII 请求
curl https://:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "My credit card is 4111-1111-1111-1111"}
    ],
    "guardrails": ["presidio-block-guard"]
  }'

运行此请求时，代理将抛出 BlockedPiiEntityError 异常。

{
  "error": {
    "message": "Blocked PII entity detected: CREDIT_CARD by Guardrail: presidio-block-guard."
  }
}

异常信息包括被阻止的实体类型（本例中为 CREDIT_CARD）以及导致阻止的 Guardrail (护栏) 名称。

高级

按请求设置语言

Presidio API 支持传递 language 参数。以下是按请求设置 language 的方法

curl
OpenAI Python SDK

Language 参数 - curl
curl https://:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "is this credit card number 9283833 correct?"}
    ],
    "guardrails": ["presidio-pre-guard"],
    "guardrail_config": {"language": "es"}
  }'

Language 参数 - Python
import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages = [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    extra_body={ 
        "metadata": {
            "guardrails": ["presidio-pre-guard"],
            "guardrail_config": {"language": "es"}
        }
    }
)
print(response)

输出解析

LLM 响应有时可能包含被屏蔽的标记。

对于 Presidio 的 '替换' 操作，LiteLLM 可以检查 LLM 响应，并将被屏蔽的标记替换为用户提交的值。

在 guardrails 部分定义您的 Guardrail (护栏)

输出解析配置
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: openai/gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

guardrails:
  - guardrail_name: "presidio-pre-guard"
    litellm_params:
      guardrail: presidio  # supported values: "aporia", "bedrock", "lakera", "presidio"
      mode: "pre_call"
      output_parse_pii: True

预期流程

用户输入: "hello world, my name is Jane Doe. My number is: 034453334"
LLM 输入: "hello world, my name is[PERSON]. My number is[PHONE_NUMBER]"
LLM 响应: "Hey[PERSON], nice to meet you!""
用户响应: "Hey Jane Doe, nice to meet you!""

Ad Hoc Recognizers (即时识别器)

通过将 JSON 文件传递给代理，向 Presidio /analyze 端点发送 Ad Hoc Recognizers (即时识别器)

示例 Ad Hoc Recognizer (即时识别器)

在您的 LiteLLM config.yaml 中定义 Ad Hoc Recognizer (即时识别器)

在 guardrails 部分定义您的 Guardrail (护栏)

Ad Hoc Recognizers (即时识别器) 配置
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: openai/gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

guardrails:
  - guardrail_name: "presidio-pre-guard"
    litellm_params:
      guardrail: presidio  # supported values: "aporia", "bedrock", "lakera", "presidio"
      mode: "pre_call"
      presidio_ad_hoc_recognizers: "./hooks/example_presidio_ad_hoc_recognizer.json"

设置以下环境变量

Ad Hoc Recognizers (即时识别器) 环境变量
export PRESIDIO_ANALYZER_API_BASE="https://:5002"
export PRESIDIO_ANONYMIZER_API_BASE="https://:5001"

当您运行代理时，可以看到其生效

以 Debug 模式运行代理

litellm --config /path/to/config.yaml --debug

发出聊天补全请求，例如

自定义 PII 请求
{
  "model": "azure-gpt-3.5",
  "messages": [{"role": "user", "content": "John Smith AHV number is 756.3026.0705.92. Zip code: 1334023"}]
}

并搜索以 Presidio PII Masking 开头的任何日志，例如

PII 屏蔽日志

Presidio PII Masking: Redacted pii message: <PERSON> AHV number is <AHV_NUMBER>. Zip code: <US_DRIVER_LICENSE>

仅记录日志

仅在记录到 Langfuse 等系统之前应用 PII 屏蔽。

不应用于实际的 LLM API 请求/响应。

注意

目前仅适用于

/chat/completion 请求
在 '成功' 日志记录时

在您的 LiteLLM config.yaml 中定义 mode: logging_only

在 guardrails 部分定义您的 Guardrail (护栏)

仅记录日志配置
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: openai/gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

guardrails:
  - guardrail_name: "presidio-pre-guard"
    litellm_params:
      guardrail: presidio  # supported values: "aporia", "bedrock", "lakera", "presidio"
      mode: "logging_only"

设置以下环境变量

仅记录日志环境变量
export PRESIDIO_ANALYZER_API_BASE="https://:5002"
export PRESIDIO_ANONYMIZER_API_BASE="https://:5001"

启动代理

启动代理

litellm --config /path/to/config.yaml

测试！

测试仅记录日志
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Hi, my name is Jane!"
    }
  ]
  }'

预期记录日志响应

包含已屏蔽 PII 的记录日志响应

Hi, my name is <PERSON>!

PII、PHI 屏蔽 - Presidio

概述​

部署选项​

快速开始​

1. 创建 PII、PHI 屏蔽 Guardrail (护栏)​

1.2 配置实体类型​

mode 的支持值​

2. 启动 LiteLLM Gateway​

3. 测试！​

3.1 LiteLLM UI​

3.2 在代码中测试​

追踪 Guardrail (护栏) 请求​

LiteLLM UI​

Langfuse​

实体类型配置​

在 config.yaml 中配置实体类型​

支持的实体类型​

支持的操作​

使用实体类型配置测试请求​

高级​

按请求设置语言​

输出解析​

Ad Hoc Recognizers (即时识别器)​

在您的 LiteLLM config.yaml 中定义 Ad Hoc Recognizer (即时识别器)​

仅记录日志​

概述

部署选项

快速开始

1. 创建 PII、PHI 屏蔽 Guardrail (护栏)

1.2 配置实体类型

`mode` 的支持值

2. 启动 LiteLLM Gateway

3. 测试！

3.1 LiteLLM UI

3.2 在代码中测试

追踪 Guardrail (护栏) 请求

LiteLLM UI

Langfuse

实体类型配置

在 config.yaml 中配置实体类型

支持的实体类型

支持的操作

使用实体类型配置测试请求

高级

按请求设置语言

输出解析

Ad Hoc Recognizers (即时识别器)

在您的 LiteLLM config.yaml 中定义 Ad Hoc Recognizer (即时识别器)

仅记录日志