Process LLM Input

PII Anonymization

Some of your incoming prompts may include personally identifiable information (PII). With Prediction Guard’s PII anonymization feature, you can detect PII such as names, email addresses, phone numbers, credit card details, and country-specific ID numbers like SSNs, NHS numbers, and passport numbers.

Here’s a demonstration of how this works.

copy
1import os
2import json
3
4from predictionguard import PredictionGuard
5
6# Set your Prediction Guard token as an environmental variable.
7os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
8
9client = PredictionGuard()
10
11result = client.pii.check(
12 prompt="Hello, my name is John Doe and my SSN is 111-22-3333",
13 replace=False
14)
15
16print(json.dumps(
17 result,
18 sort_keys=True,
19 indent=4,
20 separators=(',', ': ')
21))

This outputs the PII entity and indices of where the info was found.

copy
1{
2 "checks": [
3 {
4 "pii_types_and_positions": "[{\"start\": 17, \"end\": 25, \"type\": \"PERSON\"}, {\"start\": 40, \"end\": 51, \"type\": \"US_SSN\"}]",
5 "index": 0,
6 "status": "success"
7 }
8 ],
9 "created": 1701721456,
10 "id": "pii-O0CdxbefFwSRo7uypla7hdUka3pPf",
11 "object": "pii_check"
12}

To maintain utility without compromising privacy, you have the option to replace PII with fake names and then forward the modified prompt to the LLM for further processing.

copy
1result = client.pii.check(
2 prompt="Hello, my name is John Doe and my SSN is 111-22-3333",
3 replace=True,
4 replace_method="fake"
5)
6
7print(json.dumps(
8 result,
9 sort_keys=True,
10 indent=4,
11 separators=(',', ': ')
12))

The processed prompt will then be.

copy
1{
2 "checks": [
3 {
4 "new_prompt": "Hello, my name is William and my SSN is 222-33-4444",
5 "index": 0,
6 "status": "success"
7 }
8 ],
9 "created": 1701721456,
10 "id": "pii-O0CdxbefFwSRo7uypla7hdUka3pPf",
11 "object": "pii_check"
12}

Other options for the replace_method parameter include: random (to replace the detected PII with random character), category (to mask the PII with the entity type) and mask (simply replace with *).

Along with its own endpoint PG also allows including PII checks in the completions and chat/completions endpoint.

copy
1import os
2import json
3import predictionguard as pg
4
5# Set your Prediction Guard token as an environmental variable.
6os.environ["PREDICTIONGUARD_TOKEN"] = <PG token>
7
8response = pg.Completion.create(
9 model="Nous-Hermes-Llama2-13B",
10 prompt="This is Sam's phone number: 123-876-0989. Based on the phone number please tell me where he lives",
11 max_tokens=100,
12 temperature=0.7,
13 top_p=0.9,
14 input={"pii": "replace", "pii_replace_method": "fake"}
15)
16
17print(json.dumps(response, sort_keys=True, indent=4, separators=(',', ': ')))

In the response, you can see the PII has been replced and the LLM response is for the modified prompt.

copy
1{
2 "choices": [
3 {
4 "index": 0,
5 "model": "Nous-Hermes-Llama2-13B",
6 "status": "success",
7 "text": "?\nI don't have any information about his location. Can you provide more details about James or the phone number?"
8 }
9 ],
10 "created": 1715088867,
11 "id": "cmpl-eBOPBS5k2ziC7J45NBnOdrvbmNZg7",
12 "object": "text_completion"
13}

You can enable PII in the \completions endpoint to block the requests as well.

copy
1import os
2import json
3import predictionguard as pg
4
5# Set your Prediction Guard token as an environmental variable.
6os.environ["PREDICTIONGUARD_TOKEN"] = <PG token>
7
8response = pg.Completion.create(
9 model="Nous-Hermes-Llama2-13B",
10 prompt="What is Sam",
11 max_tokens=100,
12 temperature=0.7,
13 top_p=0.9,
14 input={"pii": "block"}
15)
16
17print(json.dumps(response, sort_keys=True, indent=4, separators=(',', ': ')))

Enabling this will lead to blocking the prompt with PII to reach the LLM. You will be seeing this response.

copy
1{
2 "choices": [
3 {
4 "index": 0,
5 "model": "Nous-Hermes-Llama2-13B",
6 "status": "error: personal identifiable information detected",
7 "text": ""
8 }
9 ],
10 "created": 1715089688,
11 "id": "cmpl-UGgwaUVYHm7jXNmFXrPGuh7OkH2EK",
12 "object": "text_completion"
13}

You can add the pii check to the chat completions as well. This is illustrated below.

copy
1import os
2import json
3from predictionguard import PredictionGuard
4
5os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
6client = PredictionGuard()
7
8messages = [
9 {
10 "role": "system",
11 "content": "You are a helpful assistant that provides safe and private answers"
12 },
13 {
14 "role": "user",
15 "content": "This is Kate's phone number: 796-097-7766. Based on this where is she located"
16 }
17]
18
19result = client.chat.completions.create(
20 model="Neural-Chat-7B",
21 messages=messages,
22 input={"pii": "replace", "pii_replace_method": "fake"}
23)
24print(json.dumps(
25 result,
26 sort_keys=True,
27 indent=4,
28 separators=(',', ': ')
29))

This will produce an output like the following.

copy
1{
2 "choices": [
3 {
4 "index": 0,
5 "message": {
6 "content": "Without more information about Kyle or the area code, it's difficult to determine an exact location. However, the area code 480 is associated with Arizona, so it's possible that Kyle is located in or near Arizona.",
7 "output": null,
8 "role": "assistant"
9 },
10 "status": "success"
11 }
12 ],
13 "created": 1716234761,
14 "id": "chat-F34QJfOM771wYxT1YYWkkrOFyTvAg",
15 "model": "Neural-Chat-7B",
16 "object": "chat_completion"
17}

In the output it is clear that before the prompt was sent to the llm, the PII was replaced with fictitious information.