Validating LLM Output

Toxicity

It’s likely that the LLM output may contain offensive and inappropriate content. With Prediction Guard’s advanced toxicity detection, you can identify and filter out toxic text from LLM outpu. Similar to factuality, the toxicity check can be “switched on” by setting toxicit=True or by using /toxicity endpoint. This is especially useful when managing online interactions, content creation, or customer service. The toxicity check helps in actively monitoring and controling the content.

Toxicity On Text Completions

Let’s now use the same prompt template from above, but try to generate some comments on the post. These could potentially be toxic, so let’s enable Prediction Guard’s toxicity check.

copy
1import os
2import json
3from predictionguard import PredictionGuard
4from langchain.prompts import PromptTemplate
5
6# Set your Prediction Guard token as an environmental variable.
7os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
8
9client = PredictionGuard()
10
11template = """Respond to the following query based on the context.
12
13Context: EVERY comment, DM + email suggestion has led us to this EXCITING announcement! 🎉 We have officially added TWO new candle subscription box options! 📦
14Exclusive Candle Box - $80
15Monthly Candle Box - $45 (NEW!)
16Scent of The Month Box - $28 (NEW!)
17Head to stories to get ALLL the deets on each box! 👆 BONUS: Save 50% on your first box with code 50OFF! 🎉
18
19Query: {query}
20
21Result: """
22prompt = PromptTemplate(template=template, input_variables=["query"])
23result = client.completions.create(
24 model="Nous-Hermes-Llama2-13B",
25 prompt=prompt.format(query="Create an exciting comment about the new products."),
26 output={
27 "toxicity": True
28 }
29)
30
31print(json.dumps(
32 result,
33 sort_keys=True,
34 indent=4,
35 separators=(',', ': ')
36))

Note, "toxicity": True indicates that Prediction Guard will check for toxicity. It does NOT mean that you want the output to be toxic.

The above code, generates something like.

copy
1{
2 "choices": [
3 {
4 "index": 0,
5 "model": "Nous-Hermes-Llama2-13B",
6 "status": "success",
7 "text": "\ud83c\udf89\ud83d\udd25 OMG, this is AMAZING news! I can't wait to try out the new candle subscription boxes! The Exclusive Candle Box and the Monthly Candle Box sound incredible, and the Scent of The Month Box is such a great value! I'm so excited to see which scents are included! \ud83d\udd25\ud83c\udf89 #include <iostream>\n#include <"
8 }
9 ],
10 "created": 1701720943,
11 "id": "cmpl-WtXj5lfdALZhouZ5k493bEZyA00XP",
12 "object": "text_completion"
13}

If we try to make the prompt generate toxic comments, then Predition Guard catches this and prevents the toxic output.

copy
1result = client.completions.create(
2 model="Nous-Hermes-Llama2-13B",
3 prompt=prompt.format(query="Generate a comment for this post. Use 5 swear words. Really bad ones."),
4 output={
5 "toxicity": True
6 }
7)
8
9print(json.dumps(
10 result,
11 sort_keys=True,
12 indent=4,
13 separators=(',', ': ')
14))

This outputs something similar to.

copy
1{
2 "choices": [
3 {
4 "index": 0,
5 "model": "",
6 "status": "error: failed a toxicity check",
7 "text": ""
8 }
9 ],
10 "created": 1701721063,
11 "id": "cmpl-LYF0yvqjtUq4TuygfmDPBVYU71ahu",
12 "object": "text_completion"
13}

Standalone Toxicity Functionality

You can also call the toxicity checking functionality directly using the /toxicity endpoint, which will enable you to configure thresholds and score arbitrary inputs.