Validating LLM Output

Toxicity

It is likely that the llm output may contain offensive and inappropriate content. With Prediction Guard’s advanced toxicity detection, you can identify and filter out toxic text from llm outpu. Similar to factuality, the toxicity check can be “switched on” by setting toxicit=True or by using /toxicity endpoint. This is especially useful when managing online interactions, content creation, or customer service. The toxicity check helps in actively monitoring and controling the content.

Toxicity on Text Completions

Let’s now use the same prompt template from above, but try to generate some comments on the post. These could potentially be toxic, so let’s enable Prediction Guard’s toxicity check:

1import os
2import json
3import predictionguard as pg
4from langchain.prompts import PromptTemplate
5
6os.environ["PREDICTIONGUARD_TOKEN"] = "<your access token>"
7template = """Respond to the following query based on the context.
8
9Context: EVERY comment, DM + email suggestion has led us to this EXCITING announcement! 🎉 We have officially added TWO new candle subscription box options! 📦
10Exclusive Candle Box - $80
11Monthly Candle Box - $45 (NEW!)
12Scent of The Month Box - $28 (NEW!)
13Head to stories to get ALLL the deets on each box! 👆 BONUS: Save 50% on your first box with code 50OFF! 🎉
14
15Query: {query}
16
17Result: """
18prompt = PromptTemplate(template=template, input_variables=["query"])
19result = pg.Completion.create(
20 model="Nous-Hermes-Llama2-13B",
21 prompt=prompt.format(query="Create an exciting comment about the new products."),
22 output={
23 "toxicity": True
24 }
25)
26
27print(json.dumps(
28 result,
29 sort_keys=True,
30 indent=4,
31 separators=(',', ': ')
32))

Note, "toxicity": True indicates that Prediction Guard will check for toxicity. It does NOT mean that you want the output to be toxic.

The above code, generates something like:

1{
2 "choices": [
3 {
4 "index": 0,
5 "model": "Nous-Hermes-Llama2-13B",
6 "status": "success",
7 "text": "\ud83c\udf89\ud83d\udd25 OMG, this is AMAZING news! I can't wait to try out the new candle subscription boxes! The Exclusive Candle Box and the Monthly Candle Box sound incredible, and the Scent of The Month Box is such a great value! I'm so excited to see which scents are included! \ud83d\udd25\ud83c\udf89 #include <iostream>\n#include <"
8 }
9 ],
10 "created": 1701720943,
11 "id": "cmpl-WtXj5lfdALZhouZ5k493bEZyA00XP",
12 "object": "text_completion"
13}

If we try to make the prompt generate toxic comments, then Predition Guard catches this and prevents the toxic output:

1result = pg.Completion.create(
2 model="Nous-Hermes-Llama2-13B",
3 prompt=prompt.format(query="Generate a comment for this post. Use 5 swear words. Really bad ones."),
4 output={
5 "toxicity": True
6 }
7)
8
9print(json.dumps(
10 result,
11 sort_keys=True,
12 indent=4,
13 separators=(',', ': ')
14))

This results in:

1{
2 "choices": [
3 {
4 "index": 0,
5 "model": "",
6 "status": "error: failed a toxicity check",
7 "text": ""
8 }
9 ],
10 "created": 1701721063,
11 "id": "cmpl-LYF0yvqjtUq4TuygfmDPBVYU71ahu",
12 "object": "text_completion"
13}

Standalone Toxicity functionality

You can also call the toxicity checking functionality directly using the /toxicity endpoint, which will enable you to configure thresholds and score arbitrary inputs.