Toxicity
It’s likely that the LLM output may contain offensive and inappropriate content. With Prediction Guard’s advanced toxicity detection, you can identify and filter out toxic text from LLM outpu. Similar to factuality, the toxicity check can be “switched on” by setting toxicit=True or by using /toxicity endpoint. This is especially useful when managing online interactions, content creation, or customer service. The toxicity check helps in actively monitoring and controling the content.
Toxicity On Text Completions
Let’s now use the same prompt template from above, but try to generate some
comments on the post. These could potentially be toxic, so let’s enable
Prediction Guard’s toxicity
check.
Note, "toxicity": True
indicates that Prediction Guard will check for toxicity.
It does NOT mean that you want the output to be toxic.
The above code, generates something like.
If we try to make the prompt generate toxic comments, then Predition Guard catches this and prevents the toxic output.
This outputs the following ValueError:
Standalone Toxicity Functionality
You can also call the toxicity checking functionality directly using the
/toxicity
endpoint, which will enable you to configure
thresholds and score arbitrary inputs.