For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Contact usJoin our Discord
HomeAdministration & GovernanceDevelopment and IntegrationAgent ForgeAPI ReferenceSDK Docs
HomeAdministration & GovernanceDevelopment and IntegrationAgent ForgeAPI ReferenceSDK Docs
  • Getting Started
    • Quick Start
  • Available Models
    • LLMs
    • LVMs
    • Embedding Models
    • Audio Models
    • Reranker Models
    • Model Enumerations
    • Prompt Formatting
  • Using LLMs
    • Accessing LLMs
    • Basic Prompting
    • Prompt Engineering
    • Chat Completions
    • Chaining and Retrieval
    • Tool Calling
    • Agents
    • Streaming
    • Chat Vision
    • Embeddings
  • Secure and Filter Inputs
    • PII Detection
    • Injection Prevention
  • Secure and Filter Outputs
    • Factuality Checks
    • Toxicity Detection
  • Integration Examples
    • Custom Output Structuring
    • Data Chat with LLMs
    • Data Extraction + Factuality
    • LangChain Integration
    • Code Assistant
  • Support
    • Contact Us
LogoLogo
Contact usJoin our Discord
On this page
  • Toxicity On Text Completions
  • Standalone Toxicity Functionality
Secure and Filter Outputs

Toxicity

Was this page helpful?
Previous

Alternative Output Structuring With PredictionGuard

Next
Built with

It’s likely that the LLM output may contain offensive and inappropriate content. With Prediction Guard’s advanced toxicity detection, you can identify and filter out toxic text from LLM output. Similar to factuality, the toxicity check can be “switched on” by setting toxicity=True or by using /toxicity endpoint. This is especially useful when managing online interactions, content creation, or customer service. The toxicity check helps in actively monitoring and controlling the content.

Toxicity On Text Completions

Let’s now use the same prompt template from above, but try to generate some comments on the post. These could potentially be toxic, so let’s enable Prediction Guard’s toxicity check.

copy
1import os
2import json
3from predictionguard import PredictionGuard
4from langchain.prompts import PromptTemplate
5
6
7# Set your Prediction Guard API key and URL as environmental variables.
8os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
9os.environ["PREDICTIONGUARD_URL"] = "<pg url>"
10
11# You can also set them when initializing the client.
12client = PredictionGuard(
13 api_key="<Your PG API Key>",
14 url="<Your PG API URL>"
15)
16
17template = """Respond to the following query based on the context.
18
19Context: EVERY comment, DM + email suggestion has led us to this EXCITING announcement! 🎉 We have officially added TWO new candle subscription box options! 📦
20Exclusive Candle Box - $80
21Monthly Candle Box - $45 (NEW!)
22Scent of The Month Box - $28 (NEW!)
23Head to stories to get ALLL the deets on each box! 👆 BONUS: Save 50% on your first box with code 50OFF! 🎉
24
25Query: {query}
26
27Result: """
28prompt = PromptTemplate(template=template, input_variables=["query"])
29result = client.completions.create(
30 model="gpt-oss-120b",
31 prompt=prompt.format(query="Create an exciting comment about the new products."),
32 output={
33 "toxicity": True
34 }
35)
36
37print(json.dumps(
38 result,
39 sort_keys=True,
40 indent=4,
41 separators=(',', ': ')
42))

Note, "toxicity": True indicates that Prediction Guard will check for toxicity. It does NOT mean that you want the output to be toxic.

The above code, generates something like.

copy
1{
2 "choices": [
3 {
4 "index": 0,
5 "text": " Congratulations on the new subscription box options! I'm eager to try out the Monthly Candle Box and discover a new scent every month. Thanks for offering such a fantastic deal with the 50% off code on the first box. Can't wait to light up my home with unique scents! \ud83d\udd6f\ufe0f\u2728 #CandleLover #ExcitedCustomer. \nNote: This comment showcases excitement for the new subscription box options, appreciation for the discount, and enthusiasm for spreading the word as"
6 }
7 ],
8 "created": 1727889539,
9 "id": "cmpl-a4baa9ca-949e-44e5-a3b7-1416a1d8ee64",
10 "model": "gpt-oss-120b",
11 "object": "text_completion"
12}

If we try to make the prompt generate toxic comments, then Predition Guard catches this and prevents the toxic output.

copy
1result = client.completions.create(
2 model="gpt-oss-120b",
3 prompt=prompt.format(query="Generate a comment for this post. Use 5 swear words. Really bad ones."),
4 output={
5 "toxicity": True
6 }
7)
8
9print(json.dumps(
10 result,
11 sort_keys=True,
12 indent=4,
13 separators=(',', ': ')
14))

This outputs the following ValueError:

copy
1ValueError: Could not make prediction. failed toxicity check

Standalone Toxicity Functionality

You can also call the toxicity checking functionality directly using the /toxicity endpoint, which will enable you to configure thresholds and score arbitrary inputs.