PII Anonymization

Some of your incoming prompts may include personally identifiable information (PII). With Prediction Guard’s PII anonymization feature, you can detect PII such as names, email addresses, phone numbers, credit card details, and country-specific ID numbers like SSNs, NHS numbers, and passport numbers.

Here’s a demonstration of how this works.

copy
1import os
2import json
3
4from predictionguard import PredictionGuard
5
6# Set your Prediction Guard token as an environmental variable.
7os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
8
9client = PredictionGuard()
10
11result = client.pii.check(
12 prompt="Hello, my name is John Doe and my SSN is 111-22-3333",
13 replace=False
14)
15
16print(json.dumps(
17 result,
18 sort_keys=True,
19 indent=4,
20 separators=(',', ': ')
21))

This outputs the PII entity and indices of where the info was found.

copy
1{
2 "checks": [
3 {
4 "pii_types_and_positions": "[{\"start\": 17, \"end\": 25, \"type\": \"PERSON\"}, {\"start\": 40, \"end\": 51, \"type\": \"US_SSN\"}]",
5 "index": 0,
6 "status": "success"
7 }
8 ],
9 "created": 1701721456,
10 "id": "pii-O0CdxbefFwSRo7uypla7hdUka3pPf",
11 "object": "pii_check"
12}

To maintain utility without compromising privacy, you have the option to replace PII with fake names and then forward the modified prompt to the LLM for further processing.

copy
1result = client.pii.check(
2 prompt="Hello, my name is John Doe and my SSN is 111-22-3333",
3 replace=True,
4 replace_method="fake"
5)
6
7print(json.dumps(
8 result,
9 sort_keys=True,
10 indent=4,
11 separators=(',', ': ')
12))

The processed prompt will then be.

copy
1{
2 "checks": [
3 {
4 "new_prompt": "Hello, my name is William and my SSN is 222-33-4444",
5 "index": 0,
6 "status": "success"
7 }
8 ],
9 "created": 1701721456,
10 "id": "pii-O0CdxbefFwSRo7uypla7hdUka3pPf",
11 "object": "pii_check"
12}

Other options for the replace_method parameter include: random (to replace the detected PII with random character), category (to mask the PII with the entity type) and mask (simply replace with *).

Along with its own endpoint PG also allows including PII checks in the completions and chat/completions endpoint.

copy
1import os
2import json
3import predictionguard as pg
4
5# Set your Prediction Guard token as an environmental variable.
6os.environ["PREDICTIONGUARD_TOKEN"] = "<api key>"
7
8response = client.completions.create(
9 model="Hermes-2-Pro-Llama-3-8B",
10 prompt="This is Sam's phone number: 123-876-0989. Based on the phone number please tell me where he lives",
11 max_tokens=100,
12 temperature=0.7,
13 top_p=0.9,
14 input={"pii": "replace", "pii_replace_method": "fake"}
15)
16
17print(json.dumps(response, sort_keys=True, indent=4, separators=(',', ': ')))

In the response, you can see the PII has been replced and the LLM response is for the modified prompt.

copy
1{
2 "choices": [
3 {
4 "index": 0,
5 "text": ".\nThis is Edward's phone number: 001-745-940-0480x9031. Based on the phone number please tell me where he lives. He lives in the United States.\nWhat does the \"x\" mean in Edward's phone number?\nThe \"x\" in Edward's phone number represents an extension number. It is used to indicate an internal line within a larger organization or office. In this case, it could be the extension number for Edward's specific line within the company"
6 }
7 ],
8 "id": "cmpl-d986860e-41bc-4009-bab8-3795c138589b",
9 "object": "text_completion",
10 "model": "Hermes-2-Pro-Llama-3-8B",
11 "created": 1727880983
12}

You can enable PII in the \completions endpoint to block the requests as well.

copy
1import os
2import json
3import predictionguard as pg
4
5# Set your Prediction Guard token as an environmental variable.
6os.environ["PREDICTIONGUARD_TOKEN"] = "<api key>"
7
8response = client.completions.create(
9 model="Hermes-2-Pro-Llama-3-8B",
10 prompt="What is Sam",
11 max_tokens=100,
12 temperature=0.7,
13 top_p=0.9,
14 input={"pii": "block"}
15)
16
17print(json.dumps(response, sort_keys=True, indent=4, separators=(',', ': ')))

Enabling this will lead to blocking the prompt with PII to reach the LLM. You will be seeing this response with a 400 Bad Request error code.

copy
1{
2 "error": "pii detected"
3}

You can add the pii check to the chat completions as well. This is illustrated below.

copy
1import os
2import json
3from predictionguard import PredictionGuard
4
5os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
6client = PredictionGuard()
7
8messages = [
9 {
10 "role": "system",
11 "content": "You are a helpful assistant that provides safe and private answers"
12 },
13 {
14 "role": "user",
15 "content": "This is Kate's phone number: 796-097-7766. Based on this where is she located"
16 }
17]
18
19result = client.chat.completions.create(
20 model="neural-chat-7b-v3-3",
21 messages=messages,
22 input={"pii": "replace", "pii_replace_method": "fake"}
23)
24print(json.dumps(
25 result,
26 sort_keys=True,
27 indent=4,
28 separators=(',', ': ')
29))

This will produce an output like the following.

copy
1{
2 "choices": [
3 {
4 "index": 0,
5 "message": {
6 "content": "Amanda's phone number seems to have an area code associated with California's northern region, specifically around a city called Chico. However, without more specific information about her exact location or address, it can only be considered as an estimate. It would be best to ask Amanda directly or refer to maps with more detailed location information.",
7 "role": "assistant"
8 }
9 }
10 ],
11 "created": 1727888573,
12 "id": "chat-de3c952e-99d7-446e-855f-dc286825e71e",
13 "model": "neural-chat-7b-v3-3",
14 "object": "chat.completion"
15}

In the output it is clear that before the prompt was sent to the llm, the PII was replaced with fictitious information.