Process LLM Input

PII anonymization

Some of your incoming prompts may include personally identifiable information (PII). With Prediction Guard’s PII anonymization feature, you can detect PII such as names, email addresses, phone numbers, credit card details, and country-specific ID numbers like SSNs, NHS numbers, and passport numbers. Here’s a demonstration of how this works:

1import os
2import json
3
4import predictionguard as pg
5
6os.environ['PREDICTIONGUARD_TOKEN'] = "<your access token>"
7
8result = pg.PII.check(
9 prompt="Hello, my name is John Doe and my SSN is 111-22-3333",
10 replace=False
11)
12
13print(json.dumps(
14 result,
15 sort_keys=True,
16 indent=4,
17 separators=(',', ': ')
18))

This outputs the PII entity and indices of where the info was found:

1{
2 "checks": [
3 {
4 "pii_types_and_positions": "[{\"start\": 17, \"end\": 25, \"type\": \"PERSON\"}, {\"start\": 40, \"end\": 51, \"type\": \"US_SSN\"}]",
5 "index": 0,
6 "status": "success"
7 }
8 ],
9 "created": 1701721456,
10 "id": "pii-O0CdxbefFwSRo7uypla7hdUka3pPf",
11 "object": "pii_check"
12}

To maintain utility without compromising privacy, you have the option to replace PII with fake names and then forward the modified prompt to the LLM for further processing:

1result = pg.PII.check(
2 prompt="Hello, my name is John Doe and my SSN is 111-22-3333",
3 replace=True,
4 replace_method="fake"
5)
6
7print(json.dumps(
8 result,
9 sort_keys=True,
10 indent=4,
11 separators=(',', ': ')
12))

The processed prompt will then be:

1{
2 "checks": [
3 {
4 "new_prompt": "Hello, my name is William and my SSN is 222-33-4444",
5 "index": 0,
6 "status": "success"
7 }
8 ],
9 "created": 1701721456,
10 "id": "pii-O0CdxbefFwSRo7uypla7hdUka3pPf",
11 "object": "pii_check"
12}

Other options for the replace_method parameter include: random (to replace the detected PII with random character), category (to mask the PII with the entity type) and mask (simply replace with *).