Data Extraction + Factuality Checks | Prediction Guard

This guide demonstrates the extraction of patient information from simulated doctor-patient transcripts. The extracted information is validated using a the factual consistency checks from Prediction Guard. The example focuses on the first 5 rows of a Kaggle dataset containing example simulated doctor-patient transcripts.

Load the data

Download the data from this json file. You can then use the code below to load the necessary libraries and the dataset from the above mentioned JSON file. The code converts the data into a Pandas DataFrame and selects the first 5 rows for testing.

copy

1 import json
2 import itertools
3 
4 import pandas as pd
5 from langchain import PromptTemplate
6 from predictionguard import PredictionGuard
7 
8 # Set your Prediction Guard token as an environmental variable.
9 os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
10 
11 client = PredictionGuard()
12 
13 # Load the JSON data into a dataframe
14 data = []
15 with open('transcripts.json') as f:
16     for line in itertools.islice(f, 5):
17         line = line.strip()
18         if not line: continue
19         data.append(json.loads(line))
20 df = pd.DataFrame(data)
21 
22 # Transform rows to columns
23 df = df.transpose()
24 
25 # Reset the index and assign an index column
26 df = df.reset_index()
27 df.columns = ['id', 'transcript']
28 
29 # Reset the index (optional)
30 df.reset_index(drop=True, inplace=True)
31 
32 # Start with 5 rows of the dataframe
33 df=df.head(5)

Summarize the data

When processing uniquely formatted, unstructured text with LLMs, it is sometimes useful to summarize the input text into a coherent and well-structured paragraph. The code below defines a prompt for summarization, creates a prompt template using LangChain, and uses the Hermes-3-Llama-3.1-70B to generate summaries for each transcript. The generated summaries are added as a new column in the DataFrame, and we save them to a CSV file (in case we want them later).

copy

1 # Define the summarization prompt
2 summarize_template = """### Instruction:
3 Summarize the input transcript below.
4 
5 ### Input:
6 {transcript}
7 
8 ### Response:
9 """
10 
11 summary_prompt = PromptTemplate(template=summarize_template,
12     input_variables=["context"],
13 )
14 
15 # Loop over the rows summarizing the data
16 summaries = []
17 for i,row in df.iterrows():
18     result=client.completions.create(
19         model="Hermes-3-Llama-3.1-70B",
20         prompt=summary_prompt.format(
21             transcript=row['transcript']
22         ),
23         max_completion_tokens=200,
24         temperature=0.1
25     )
26     print(result['choices'][0]['text'])
27     summaries.append(result['choices'][0]['text'])
28 
29 # Print out some summaries to sanity check them
30 df['summary']=summaries
31 print(df.head(5))
32 
33 # Save the resulting data
34 df.to_csv("summarized_transcripts.csv", index=False)

Extract Information and Perform Factuality Checks

We can now create a question answering prompt and prompt template to perform the information extraction. This prompt template can be re-used to answer relevant questions from the data - symptoms, Patient name, when the symptom started, level of pain the patient is experiencing, etc.

copy

1 # Define the questions we want answered
2 questions=["What symptoms is the patient experiencing",
3            "What is the Patient's name?",
4            "When did the symptoms start?",
5            "On a scale of 1 to 10, what level of pain is the patient experiencing?"]
6 
7 # Define the question answering prompt
8 question_answer_template = """### Instruction:
9 Answer the following question {question} using the below doctor-patient transcript summary.
10 
11 ### Input:
12 {transcript_summary}
13 
14 ### Response:
15 """
16 
17 q_and_a_prompt = PromptTemplate(template=question_answer_template,
18     input_variables=["question", "transcript_summary"],
19 )

Factuality checks are crucial for evaluating the accuracy of information provided by the language model, especially when dealing with high risk data. Prediction Guard leverages state-of-the-art models for factual consistency checks, ensuring the reliability of outputs in reference to the context of the prompts. Thus, after we prompt the model with each question, we evaluate the responses against the corresponding transcript summaries. Factuality scores are generated to assess the accuracy of the answers.

copy

1 answers = {q: [] for q in questions}
2 fact_scores = {q: [] for q in questions}
3 
4 # Loop over the rows of the dataframe processing each transcript
5 for i, row in df.iterrows():
6     for q in questions:
7 
8         # Extract the information
9         result = client.completions.create(
10             model="Hermes-3-Llama-3.1-70B",
11             prompt=q_and_a_prompt.format(
12                 question=q, transcript_summary=row["summary"]
13             ),
14             max_completion_tokens=200,
15             temperature=0.1,
16 
17         )
18 
19         # Generate a factual consistency score
20         fact_score =client.factuality.check(
21             reference=row['summary'],
22             text=result['choices'][0]['text']
23         )
24 
25         # Append the results
26         fact_scores[q].append(fact_score['checks'][0]['score'])
27         answers[q].append(result["choices"][0]["text"])
28 
29 # Add the answers and fact scores as new columns to the original DataFrame
30 for q in questions:
31     df[f"{q}_answer"] = answers[q]
32     df[f"{q}_fact_score"] = fact_scores[q]
33 
34 # Show some results
35 print(df.head(2))
36 
37 # Save the results
38 df.to_csv("answers_with_fact_scores.csv", index=False)

	id	transcript	summary	What symptoms is the patient experiencing_answer	What symptoms is the patient experiencing_fact_score	What is the Patient’s name?_answer	What is the Patient’s name?_fact_score	When did the symptoms start?_answer	When did the symptoms start?_fact_score	On a scale of 1 to 10, what level of pain is the patient experiencing?_answer	On a scale of 1 to 10, what level of pain is the patient experiencing?_fact_score
0	2055	During the…	During a…	The patient, Mr. Don Hicks, is experiencing sym…	0.08922114223	The patient’s name is Mr. Don Hicks.	0.451582998	The symptoms started when Mr. Don …	0.1504420638	The transcript summary does not conta…	0.5611280203
1	291	During the…	During a…	The patient, Tina Will, is experiencing sympt…	0.3320894539	The patient’s name is Tina Will.	0.8268791437	The symptoms started when Tina pre…	0.7537286878	I am sorry to hear that Tina is expe…	0.2882582843
2	102	”D: Good mo…	The patien…	The patient, Tommie, has been experiencing sy…	0.1203972548	I’m sorry, the question “What is t…	0.6292911172	The symptoms started when?	0.7372002602	”I’m sorry to hear that Tommie has b…	0.1583527327
3	2966	”D: Good mo…	The patien…	The patient, Chris, is experiencing symptoms…	0.03648262098	The patient’s name is Chris.	0.8302355409	The symptoms started when Chris exp…	0.8345838189	I’m sorry to hear that Chris is expe…	0.7252672315
4	2438	”D: Hi Erne…	Ernest visi…	The patient, Ernest, is experiencing bladder…	0.149951458	The patient’s name is Ernest.	0.6766917109	The symptoms started when Ernest st…	0.1891670823	Based on the information provided, i…	0.6463367343

You can also call the factual consistency checking functionality directly using the /factuality endpoint, which will enable you to configure thresholds and score arbitrary inputs.