Available Models
Using Prediction Guard gives you quick and easy access to state-of-the-art open and closed access LLMs, without you needing to spend days and weeks figuring out all of the implementation details, managing a bunch of different API specs, and setting up the infrastructure for model deployments.
Open Access LLMs
These LLMs are hosted by Prediction Guard with custom inference code that allows you to control the type/structure of and validate outputs. Check here for updates, as we are adding new models every week!
Note - These models are hosted by Prediction Guard in a private and compliant manner. Prediction Guard does NOT save or share any data sent to these models. Further, customers needing HIPAA compliance can use these models with out enterprise deploy. Contact support with any questions.
Note - We only integrate models that are licensed permissively for commercial use.
Text Generation
Model Name | Model Card | Parameters | Context Length |
---|---|---|---|
Llama-2-70B | link (opens in a new tab) | 70B | 4096 |
Llama-2-13B | link (opens in a new tab) | 13B | 4096 |
Llama-2-7B | link (opens in a new tab) | 7B | 4096 |
Nous-Hermes-Llama2-7B | link (opens in a new tab) | 7B | 4096 |
Nous-Hermes-Llama2-13B | link (opens in a new tab) | 13B | 4096 |
Camel-5B | link (opens in a new tab) | 5B | 2048 |
Dolly-3B | link (opens in a new tab) | 3B | 2048 |
Dolly-7B | link (opens in a new tab) | 7B | 2048 |
Falcon-7B-Instruct | link (opens in a new tab) | 7B | 2048 |
Falcon-40B-Instruct | link (opens in a new tab) | 40B | 2048 |
h2oGPT-6_9B | link (opens in a new tab) | 6.9B | 2048 |
MPT-7B-Instruct | link (opens in a new tab) | 7B | 4096 |
Pythia-6_9-Deduped | link (opens in a new tab) | 6.9B | 2048 |
RedPajama-INCITE-Instruct-7B | link (opens in a new tab) | 7B | 2048 |
Code Generation/ Technical Assistant
Model Name | Model Card | Parameters | Context Length |
---|---|---|---|
WizardCoder | link (opens in a new tab) | 15.5B | 8192 |
StarCoder | link (opens in a new tab) | 15.5B | 8192 |
Note, Prediction Guard's models run on serverless infrastructure. If you aren't actively using models, they are scaled down. As such, your first call to a model might need to "wake up" that model inference server. You will get a message "Waking up model. Try again in a few minutes." in such cases. Typically it takes around 5-15 minutes to wake up the model server depending on the size of the model. We are actively working on reducing these cold start times.
Closed LLMs
These LLMs are integrated with Prediction Guard with custom inference code that allows you to control the type/structure of and validate outputs. However, these are not hosted by Prediction Guard in the same manner as the models above.
Note - You will need your own OpenAI API key to use the models below. Customers worried about data privacy, IP/PII leakage, HIPAA compliance, etc. should look into the above "Open Access LLMs" and/or our enterprise deploy. Contact support with any questions.
Model Name | Generation | Context Length |
---|---|---|
OpenAI-text-davinci-003 | GPT-3.5 | 4097 |
OpenAI-text-davinci-002 | GPT-3.5 | 4097 |
OpenAI-text-curie-001 | GPT-3 | 2049 |
OpenAI-text-babbage-001 | GPT-3 | 2049 |
OpenAI-text-ada-001 | GPT-3 | 2049 |
OpenAI-davinci | GPT-3 | 2049 |
OpenAI-babbage | GPT-3 | 2049 |
OpenAI-ada | GPT-3 | 2049 |
OpenAI-curie | GPT-3 | 2049 |
To use the OpenAI models above, make sure you either: (1) define the environment variable OPENAI_API_KEY
if you are using the Python client; or (2) set the header parameter OpenAI-ApiKey
if you are using the REST API.