Factuality/toxicity

Factuality and toxicity checks

Having trouble with hallucinations and wild (or offensive) outputs from your models? You might be trying to ground your model outputs with external knowledge, but that doesn't always prevent hallucination. You need checks on your outputs to make sure that you don't offend your users or provide incorrect information.

With Prediction Guard, you can add in factuality and toxicity checks with a "flip of a switch" (or configuration parameter). Our factuality check uses a state-of-the-art model trained to predict factuality of LLM output (given the context of your prompt), and our toxicity detection classfies your output as toxic or not. You can also call the factuality and toxicity checking functionality directly using the /factuality and /toxicity endpoints, which will enable you to configure thresholds and score arbitrary inputs.

Factuality Check on Text Completions

Let's use the following prompt template to determine some features of an instragram post announcing new products. First, we can define a prompt template:

import os
import json
 
import predictionguard as pg
from langchain import PromptTemplate
 
os.environ["PREDICTIONGUARD_TOKEN"] = "<your access token>"
 
template = """Respond to the following query based on the context.
 
Context: EVERY comment, DM + email suggestion has led us to this EXCITING announcement! 🎉 We have officially added TWO new candle subscription box options! 📦
Exclusive Candle Box - $80
Monthly Candle Box - $45 (NEW!)
Scent of The Month Box - $28 (NEW!)
Head to stories to get ALLL the deets on each box! 👆 BONUS: Save 50% on your first box with code 50OFF! 🎉
 
Query: {query}
 
Result: """
prompt = PromptTemplate(template=template, input_variables=["query"])

Then we can inject a query and add in both a type and a factuality check to the output configuration. When we add "factuality": True, Prediction Guard will check the output for factuality. If the response is suspect, Prediction Guard will return an error status.

result = pg.Completion.create(
    model="Camel-5B",
    prompt=prompt.format(query="How many new products are listed?"),
    output={
        "type": "integer",
        "factuality": True
    }
)
 
print(json.dumps(
    result,
    sort_keys=True,
    indent=4,
    separators=(',', ': ')
))

This outputs something similar to:

{
    "choices": [
        {
            "index": 0,
            "output": 2,
            "status": "success",
            "text": "2"
        }
    ],
    "created": 1686858057,
    "id": "cmpl-3MM8uyNCBLP3sroYJiUhBhpPLj66m",
    "model": "Camel-5B",
    "object": "text_completion"
}

Now, we could try to make the model hallucinate. However, the hallucination is caught and Prediction Guard returns an error status:

result = pg.Completion.create(
    model="Camel-5B",
    prompt=prompt.format(query="How many giraffes are listed?"),
    output={
        "type": "integer",
        "factuality": True
    }
)
 
print(json.dumps(
    result,
    sort_keys=True,
    indent=4,
    separators=(',', ': ')
))

This outputs something similar to:

{
    "choices": [
        {
            "index": 0,
            "status": "error: failed a factuality or toxicity check",
            "text": ""
        }
    ],
    "created": 1686857961,
    "id": "cmpl-WiOOQBy9No5F2OFhUrK24tRnEhcdb",
    "model": "Camel-5B",
    "object": "text_completion"
}

Toxicity on Text Completions

Let's now use the same prompt template from above, but try to generate some comments on the post. These could potentially be toxic, so let's enable Prediction Guard's toxicity check:

result = pg.Completion.create(
    model="Camel-5B",
    prompt=prompt.format(query="Create an exciting comment about the new products."),
    output={
        "toxicity": True
    }
)
 
print(json.dumps(
    result,
    sort_keys=True,
    indent=4,
    separators=(',', ': ')
))

Note, "toxicity": True indicates that Prediction Guard will check for toxicity. It does NOT mean that you want the output to be toxic.

The above code, generates something like:

{
    "choices": [
        {
            "index": 0,
            "status": "success",
            "text": "\n\n🎉 Exciting news, everyone! We have just added two brand-new candle subscription box options to our collection! 📦\n\nExclusive Candle Box - $80\nMonthly Candle Box - $45 (NEW!)\nScent of The Month Box - $28 (NEW!)\n\nHead to stories to get all the details on these exciting additions! 👆 Don't miss out on saving 50% on your first box with code 50OFF! 🎉"
        }
    ],
    "created": 1686858208,
    "id": "cmpl-LUXYKYThYDoBew3owNg9de3q5Nre0",
    "model": "Camel-5B",
    "object": "text_completion"
}

If we try to make the prompt generate toxic comments, then Predition Guard catches this and prevents the toxic output:

result = pg.Completion.create(
    model="Camel-5B",
    prompt=prompt.format(query="Generate a comment for this post. Use 5 swear words. Really bad ones."),
    output={
        "toxicity": True
    }
)
 
print(json.dumps(
    result,
    sort_keys=True,
    indent=4,
    separators=(',', ': ')
))

This results in:

{
    "choices": [
        {
            "index": 0,
            "status": "error: failed a factuality or toxicity check",
            "text": ""
        }
    ],
    "created": 1686858360,
    "id": "cmpl-CpBOLNv8aFEeI0VzbuJrbZ1zlOdJ1",
    "model": "Camel-5B",
    "object": "text_completion"
}

Standalone Factuality and Toxicity functionality

You can also call the factuality and toxicity checking functionality directly using the /factuality and /toxicity endpoints, which will enable you to configure thresholds and score arbitrary inputs.