Validating LLM Output

Factuality

Navigating the llm landscape can be tricky, especially with hallucinations or inaccurate answers. Whether you’re integrating llms into customer-facing products or using them for internal data processing, ensuring the accuracy of the information provided is essential. Prediction Guard used SOTA models for factuality check to evaluate the outputs of LLMs against the context of the prompts. You can either add factuality=True or use /factuality endpoint to directly access this functionality.

Let’s use the following prompt template to determine some features of an instragram post announcing new products. First, we can define a prompt template:

1import os
2import json
3
4import predictionguard as pg
5from langchain.prompts import PromptTemplate
6
7os.environ["PREDICTIONGUARD_TOKEN"] = "<your access token>"
8
9template = """### Instruction:
10Read the context below and respond with an answer to the question.
11
12### Input:
13Context: {context}
14
15Question: {question}
16
17### Response:
18"""
19
20prompt = PromptTemplate(
21 input_variables=["context", "question"],
22 template=template,
23)
24
25context = "California is a state in the Western United States. With over 38.9 million residents across a total area of approximately 163,696 square miles (423,970 km2), it is the most populous U.S. state, the third-largest U.S. state by area, and the most populated subnational entity in North America. California borders Oregon to the north, Nevada and Arizona to the east, and the Mexican state of Baja California to the south; it has a coastline along the Pacific Ocean to the west. "
26
27result = pg.Completion.create(
28 model="Nous-Hermes-Llama2-13B",
29 prompt=prompt.format(
30 context=context,
31 question="What is California?"
32 )
33)

We can then check the factulaity score of the answer that is generated by the llm:

1fact_score = pg.Factuality.check(
2 reference=context,
3 text=result['choices'][0]['text']
4)
5
6print("COMPLETION:", result['choices'][0]['text'])
7print("FACT SCORE:", fact_score['checks'][0]['score'])

This outputs something similar to:

COMPLETION: California is a state located in the western region of the United States. It is the most populous state in the country, with over 38.9 million residents, and the third-largest state by area, covering approximately 163,696 square miles (423,970 km2). California shares its borders with Oregon to the north, Nevada and Arizona to the east, and the Mexican state of Baja California to the south. It also
FACT SCORE: 0.8541514873504639

Now, we could try to make the model hallucinate. However, the hallucination is caught and Prediction Guard returns an error status:

1result = pg.Completion.create(
2 model="Nous-Hermes-Llama2-13B",
3 prompt=prompt.format(
4 context=context,
5 question="Make up something completely fictitious about California. Contradict a fact in the given context."
6 )
7)
8
9fact_score = pg.Factuality.check(
10 reference=context,
11 text=result['choices'][0]['text']
12)
13
14print("COMPLETION:", result['choices'][0]['text'])
15print("FACT SCORE:", fact_score['checks'][0]['score'])

This outputs something similar to:

COMPLETION: California is the smallest state in the United States.
FACT SCORE: 0.12891793251037598

Standalone Factuality functionality

You can also call the factuality checking functionality directly using the /factuality endpoint, which will enable you to configure thresholds and score arbitrary inputs.