Getting Started

Getting Started

Technical teams need to figure out how to integrate the latest Large Language Models (LLMs), but:

  • You can’t build robust systems with inconsistent, unstructured text blob output; and
  • LLM integrations scare corporate lawyers, finance departments, and security professionals due to hallucinations, cost, lack of compliance (e.g., HIPAA), leaked IP/PII, and “injection” vulnerabilities.

Some companies are moving forward anyway by investing tons of engineering time/money in their own wrappers around LLMs and expensive hosting with OpenAI/Azure. Others are ignoring these issues and pressing forward with fragile and risky LLM integrations.

At Prediction Guard, we think that you should get consistent, structured output from compliant AI systems (without crazy implementation/ hosting costs), so our solution lets you:

  1. Control the structure of and easily constrain LLM output to the types, formats, and information relevant to your business;
  2. Validate and check LLM output to guard against hallucination; and
  3. Implement compliant LLM systems (SOC 2, HIPAA, and self-hosted) that give your legal counsel warm fuzzy feeling while still delighting your customers with AI features.

Sounds pretty great right? Follow the steps below to starting controlling SOTA LLMs:

1. Get access to Prediction Guard Enterprise

We host and control the latest LLMs for you in our enterprise platform, so you can focus on your prompts and chains. To access the hosted LLMs (and our guarded interfaces to closed LLMs like OpenAI), contact us about enterprise here (opens in a new tab).

After setting up your enterprise account, you will receive one or more Prediction Guard access tokens. You will need this access token to continue.

2. (Optional) Install the Python client

You can configure and use Prediction Guard using our Python client or via REST API directly. If you are wanting to use the Python client, you can install it as follows:

$ pip install predictionguard

3. Start using controlled LLMs!

Suppose you want to prompt an LLM to perform zero-shot text classification with the following prompt:

Does the following input text refer to a politician, actor, infrastructure, or a societal event?

Input Text: The 1992 United States presidential election in Colorado took place on November 3, 1992, as part of the 1992 United States presidential election. Voters chose eight representatives, or electors to the Electoral College, who voted for President and Vice President. Colorado was won by the Democratic nominees, Governor Bill Clinton of Arkansas and his running mate Senator Al Gore of Tennessee. Clinton and Gore defeated the Republican nominees, incumbent President George H.W. Bush of Texas and Senator Dan Quayle of Indiana.

Category: 

You can use our Python client or REST API to prompt one of many open or closed LLMs (MPT-7B, Camel-5B, OpenAI text-davinci-003, etc.). Our API is very similar to the OpenAI API for text completion, but we've added an output argument/field that let's you control, constrain, and perform checks on the output of LLMs. For this example, let's contrain the output of the LLM to one of the following classes: politician, actor, infrastructure, and societal_event (i.e., we are enforcing a categorical type):

ℹ️

Note, you will need to replace <your access token> in the above examples with your actual access token retrieved from your user dashboard (opens in a new tab).

Prediction Guard modifies the inference code around the model contraining the possible output to one of the provided categorical values in the optional output field/argument. Thus, this should result in the following output:

{
    "choices": [
        {
            "index": 0,
            "output": "politician",
            "status": "success",
            "text": "politician"
        }
    ],
    "created": 1685445446,
    "id": "cmpl-pmwldq6Zl2xNvCgjFU0FUrMdynwm4",
    "model": "MPT-7B-Instruct",
    "object": "text_completion"
}

The choices[0].text field contains the raw LLM output. The choices[0].output field contains any typed output from Prediction Guard (relevant for integer, float, and boolean outputs). Without Prediction Guard's output argument/field to guide the LLM, the output looks like the following text vomit 🤮:

{
    "choices": [
        {
            "index": 0,
            "status": "success",
            "text": "\nThe text refers to a politician#Bill Clinton#president of the United States#actor#Bill Clinton#politician---\n\nA bill Clinton is a politician, actor, infrastructure, or a societal event?\n\nA bill Clinton is a politician, actor, infrastructure, or a societal event?\n\nA bill Clinton is a politician, actor, infrastructure, or a societal event?\n\nA bill Clinton is a politician, actor, infrastructure, or a societal event?\n"
        }
    ],
    "created": 1685445589,
    "id": "cmpl-P5LYYYJfVWnIxRFfTiCe4QSjUwtm4",
    "model": "MPT-7B-Instruct",
    "object": "text_completion"
}
ℹ️

Note, Prediction Guard's models run on serverless infrastructure. If you aren't actively using models, they are scaled down. As such, your first call to a model might need to "wake up" that model inference server. You will get a message "Waking up model. Try again in a few minutes." in such cases. Typically it takes around 5-15 minutes to wake up the model server depending on the size of the model. We are actively working on reducing these cold start times.

4. Explore other output types, models, and guides

This is only the beginning of what your can do with Prediction Guard (not to mention what is on our roadmap). Now that you have a working example, consider exploring:

Last updated on November 16, 2023