Typed Document Q&A with Chroma

Question and Answer over Documents

This example shows how to use Prediction Guard (LangChain (opens in a new tab) and Chroma (opens in a new tab)) to build a typed and checked question answering systems over one or more documents. That is, you can think of this systems as:

  • Input - User query
  • Processing - Find relevant documents and relevant context within those documents
  • Output - Typed (e.g., integer) and checked (e.g., for consistency) output from Prediction Guard based on the given query and context

The full example can be run with Python in this Google Colab (opens in a new tab).

Setup and dependencies

As mentioned we will pair Prediction Guard with LangChain (for convenient prompting) and Chroma (a vector database). As such, you will need to install the following dependencies:

$ pip install langchain predictionguard chromadb sentence_transformers

Then we will use the following imports:

import os
 
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import PredictionGuard
from langchain.indexes import VectorstoreIndexCreator
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains.question_answering import load_qa_chain
from langchain import PromptTemplate
 
os.environ["PREDICTIONGUARD_TOKEN"] = "<your prediction guard access token>"

Prep the document data and vector database

We will use this presidential state-of-the-union address (opens in a new tab) as our document. First we, load the document into the vector database in chunks:

# Load in the text chunks.
with open("state_of_the_union.txt") as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=128, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)
 
# Here we will use open access sentence transformer embeddings.
embeddings = HuggingFaceEmbeddings()
 
# Create the vector embedding store for search.
docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))]).as_retriever()

General text queries

If we just want easy, open domain text queries (and unstructured text output), we can use any of the Prediction Guard models (e.g., Falcon) easily:

# Create a prompt template for our question answering.
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
 
{context}
 
Question: {question}
Answer:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
 
# Make a query with the Falcon 7B model
chain = load_qa_chain(PredictionGuard(model="Falcon-7B-Instruct"), 
                      chain_type="stuff", prompt=PROMPT)
query = "What did the president say about Justice Breyer"
docs = docsearch.get_relevant_documents(query)
chain.run(input_documents=docs, question=query)

This should result in something like the following answer:

 President Biden called Justice Breyer a "remarkable jurist" and "an extraordinary American."

Typed/ structured output

Now let's say that we want to scrape the document for typed or specifically structured output (integer, float, categorical, etc.). This requires only a small change in our query:

chain = load_qa_chain(PredictionGuard(model="Falcon-7B-Instruct", output={
    "type": "integer"
}), chain_type="stuff", prompt=PROMPT)
query = "How many years has this nation been in existence?"
docs = docsearch.get_relevant_documents(query)
chain.run(input_documents=docs, question=query)

Which gives the output:

244

Or, we could try to get categorical output:

chain = load_qa_chain(PredictionGuard(model="Falcon-7B-Instruct", output={
    "type": "categorical",
    "categories": ["NEU", "POS", "NEG"]
}), chain_type="stuff")
query = "Is the sentiment of views on Russia and the Ukraine?"
docs = docsearch.get_relevant_documents(query)
chain.run(input_documents=docs, question=query)

Which gives the output:

NEG

Consistency checked output

Finally, let's put a consistency check on the LLM output. This will return a blank output in this case if the LLM can't produce consistent output, because that is how the LangChain integration works.

chain = load_qa_chain(PredictionGuard(model="Falcon-7B-Instruct", output={
    "type": "integer",
    "consistency": True
}), chain_type="stuff")
query = "How many million barrels will be released from the Strategic Petroleum Reserve?"
docs = docsearch.get_relevant_documents(query)
chain.run(input_documents=docs, question=query)
Last updated on September 11, 2023