Data Extraction + Factuality Checks

This guide demonstrates the extraction of patient information from simulated doctor-patient transcripts. The extracted information is validated using a the factual consistency checks from Prediction Guard. The example focuses on the first 5 rows of a Kaggle dataset containing example simulated doctor-patient transcripts.

Load the data

Download the data from this json file. You can then use the code below to load the necessary libraries and the dataset from the above mentioned JSON file. The code converts the data into a Pandas DataFrame and selects the first 5 rows for testing.

copy
1import json
2import itertools
3
4import pandas as pd
5from langchain import PromptTemplate
6from predictionguard import PredictionGuard
7
8# Set your Prediction Guard token as an environmental variable.
9os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
10
11client = PredictionGuard()
12
13# Load the JSON data into a dataframe
14data = []
15with open('transcripts.json') as f:
16 for line in itertools.islice(f, 5):
17 line = line.strip()
18 if not line: continue
19 data.append(json.loads(line))
20df = pd.DataFrame(data)
21
22# Transform rows to columns
23df = df.transpose()
24
25# Reset the index and assign an index column
26df = df.reset_index()
27df.columns = ['id', 'transcript']
28
29# Reset the index (optional)
30df.reset_index(drop=True, inplace=True)
31
32# Start with 5 rows of the dataframe
33df=df.head(5)

Summarize the data

When processing uniquely formatted, unstructured text with LLMs, it is sometimes useful to summarize the input text into a coherent and well-structured paragraph. The code below defines a prompt for summarization, creates a prompt template using LangChain, and uses the Hermes-2-Pro-Llama-3-8B to generate summaries for each transcript. The generated summaries are added as a new column in the DataFrame, and we save them to a CSV file (in case we want them later).

copy
1# Define the summarization prompt
2summarize_template = """### Instruction:
3Summarize the input transcript below.
4
5### Input:
6{transcript}
7
8### Response:
9"""
10
11summary_prompt = PromptTemplate(template=summarize_template,
12 input_variables=["context"],
13)
14
15# Loop over the rows summarizing the data
16summaries = []
17for i,row in df.iterrows():
18 result=client.completions.create(
19 model="Hermes-2-Pro-Llama-3-8B",
20 prompt=summary_prompt.format(
21 transcript=row['transcript']
22 ),
23 max_tokens=200,
24 temperature=0.1
25 )
26 print(result['choices'][0]['text'])
27 summaries.append(result['choices'][0]['text'])
28
29# Print out some summaries to sanity check them
30df['summary']=summaries
31print(df.head(5))
32
33# Save the resulting data
34df.to_csv("summarized_transcripts.csv", index=False)

Extract Information and Perform Factuality Checks

We can now create a question answering prompt and prompt template to perform the information extraction. This prompt template can be re-used to answer relevant questions from the data - symptoms, Patient name, when the symptom started, level of pain the patient is experiencing, etc.

copy
1# Define the questions we want answered
2questions=["What symptoms is the patient experiencing",
3 "What is the Patient's name?",
4 "When did the symptoms start?",
5 "On a scale of 1 to 10, what level of pain is the patient experiencing?"]
6
7# Define the question answering prompt
8question_answer_template = """### Instruction:
9Answer the following question {question} using the below doctor-patient transcript summary.
10
11### Input:
12{transcript_summary}
13
14### Response:
15"""
16
17q_and_a_prompt = PromptTemplate(template=question_answer_template,
18 input_variables=["question", "transcript_summary"],
19)

Factuality checks are crucial for evaluating the accuracy of information provided by the language model, especially when dealing with high risk data. Prediction Guard leverages state-of-the-art models for factual consistency checks, ensuring the reliability of outputs in reference to the context of the prompts. Thus, after we prompt the model with each question, we evaluate the responses against the corresponding transcript summaries. Factuality scores are generated to assess the accuracy of the answers.

copy
1answers = {q: [] for q in questions}
2fact_scores = {q: [] for q in questions}
3
4# Loop over the rows of the dataframe processing each transcript
5for i, row in df.iterrows():
6 for q in questions:
7
8 # Extract the information
9 result = client.completions.create(
10 model="Hermes-2-Pro-Llama-3-8B",
11 prompt=q_and_a_prompt.format(
12 question=q, transcript_summary=row["summary"]
13 ),
14 max_tokens=200,
15 temperature=0.1,
16
17 )
18
19 # Generate a factual consistency score
20 fact_score =client.factuality.check(
21 reference=row['summary'],
22 text=result['choices'][0]['text']
23 )
24
25 # Append the results
26 fact_scores[q].append(fact_score['checks'][0]['score'])
27 answers[q].append(result["choices"][0]["text"])
28
29# Add the answers and fact scores as new columns to the original DataFrame
30for q in questions:
31 df[f"{q}_answer"] = answers[q]
32 df[f"{q}_fact_score"] = fact_scores[q]
33
34# Show some results
35print(df.head(2))
36
37# Save the results
38df.to_csv("answers_with_fact_scores.csv", index=False)
idtranscriptsummaryWhat symptoms is the patient experiencing_answerWhat symptoms is the patient experiencing_fact_scoreWhat is the Patient’s name?_answerWhat is the Patient’s name?_fact_scoreWhen did the symptoms start?_answerWhen did the symptoms start?_fact_scoreOn a scale of 1 to 10, what level of pain is the patient experiencing?_answerOn a scale of 1 to 10, what level of pain is the patient experiencing?_fact_score
02055During the…During a…The patient, Mr. Don Hicks, is experiencing sym…0.08922114223The patient’s name is Mr. Don Hicks.0.451582998The symptoms started when Mr. Don …0.1504420638The transcript summary does not conta…0.5611280203
1291During the…During a…The patient, Tina Will, is experiencing sympt…0.3320894539The patient’s name is Tina Will.0.8268791437The symptoms started when Tina pre…0.7537286878I am sorry to hear that Tina is expe…0.2882582843
2102”D: Good mo…The patien…The patient, Tommie, has been experiencing sy…0.1203972548I’m sorry, the question “What is t…0.6292911172The symptoms started when?0.7372002602”I’m sorry to hear that Tommie has b…0.1583527327
32966”D: Good mo…The patien…The patient, Chris, is experiencing symptoms…0.03648262098The patient’s name is Chris.0.8302355409The symptoms started when Chris exp…0.8345838189I’m sorry to hear that Chris is expe…0.7252672315
42438”D: Hi Erne…Ernest visi…The patient, Ernest, is experiencing bladder…0.149951458The patient’s name is Ernest.0.6766917109The symptoms started when Ernest st…0.1891670823Based on the information provided, i…0.6463367343

You can also call the factual consistency checking functionality directly using the /factuality endpoint, which will enable you to configure thresholds and score arbitrary inputs.