This page provides information on the Large Language Models (LLMs) that are available in the Prediction Guard API. These models are designed for text inference, and are used in the /completions and /chat/completions endpoints.

Models

Model NameTypeUse CasePrompt FormatContext LengthMore Info
Hermes-3-Llama-3.1-70BChatInstruction following or chat-like applicationsChatML10240link
Hermes-3-Llama-3.1-8BChatInstruction following or chat-like applicationsChatML10240link
Hermes-2-Pro-Llama-3-8BChatInstruction following or chat-like applicationsChatML4096link
Nous-Hermes-Llama2-13bText GenerationGenerating output in response to arbitrary instructionsAlpaca4096link
Hermes-2-Pro-Mistral-7BChatInstruction following or chat-like applicationsChatML4096link
neural-chat-7b-v3-3ChatInstruction following or chat-like applicationsNeural Chat4096link
deepseek-coder-6.7b-instructCode GenerationGenerating computer code or answering tech questionsDeepseek4096link

Model Descriptions

Hermes-3-Llama-3.1-8B

This is a general use model that excels at reasoning and multi-turn conversations, with an improved focus on longer context lengths. This allows for more accuracy and recall in areas that require a longer context window, along with being an improved version of the previous Hermes and Llama line of models.

Type: Chat
Use Case: Instruction Following or Chat-Like Applications
Prompt Format: ChatML

https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

The ethos of the Hermes series of models is focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.

The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

Hermes-3-Llama-3.1-70B

This is a general use model that excels at reasoning and multi-turn conversations, with an improved focus on longer context lengths. This allows for more accuracy and recall in areas that require a longer context window, along with being an improved version of the previous Hermes and Llama line of models.

Type: Chat
Use Case: Instruction Following or Chat-Like Applications
Prompt Format: ChatML

https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-70B

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

The ethos of the Hermes series of models is focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.

The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

Hermes-2-Pro-Llama-3-8B

A general use model that maintains excellent general task and conversation capabilities while excelling at JSON Structured Outputs and improving on several other metrics.

Type: Chat
Use Case: Instruction Following or Chat-Like Applications
Prompt Format: ChatML

https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs, and has improved on several other metrics as well, scoring a 90% on our function calling evaluation built in partnership with Fireworks.AI, and an 84% on our structured JSON Output evaluation.

Hermes Pro takes advantage of a special system prompt and multi-turn function calling structure with a new chatml role in order to make function calling reliable and easy to parse.

Nous-Hermes-Llama2-13b

A general use model that combines advanced analytics capabilities with a vast 13 billion parameter count, enabling it to perform in-depth data analysis and support complex decision-making processes. This model is designed to process large volumes of data, uncover hidden patterns, and provide actionable insights.

Type: Text Generation
Use Case: Generating Output in Response to Arbitrary Instructions
Prompt Format: Alpaca

https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b

Nous-Hermes-Llama2-13b is a state-of-the-art language model fine-tuned on over 300,000 instructions. This model was fine-tuned by Nous Research, with Teknium and Emozilla leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors.

This Hermes model uses the exact same dataset as Hermes on Llama-1. This is to ensure consistency between the old Hermes and new, for anyone who wanted to keep Hermes as similar to the old one, just more capable.

This model stands out for its long responses, lower hallucination rate, and absence of OpenAI censorship mechanisms. The fine-tuning process was performed with a 4096 sequence length on an 8x a100 80GB DGX machine.

Hermes-2-Pro-Mistral-7B

A general use model that offers advanced natural language understanding and generation capabilities, empowering applications with high-performance text-processing functionalities across diverse domains and languages. The model excels in delivering accurate and contextually relevant responses, making it ideal for a wide range of applications, including chatbots, language translation, content creation, and more.

Type: Chat
Use Case: Instruction Following or Chat-Like Applications
Prompt Format: ChatML

https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs, and has improved on several other metrics as well, scoring a 90% on our function calling evaluation built in partnership with Fireworks.AI, and an 84% on our structured JSON Output evaluation.

Hermes Pro takes advantage of a special system prompt and multi-turn function calling structure with a new chatml role in order to make function calling reliable and easy to parse. Learn more about prompting below.

neural-chat-7b-v3-3

A revolutionary AI model for performing digital conversations.

Type: Chat
Use Case: Instruction Following or Chat-Like Applications
Prompt Format: Neural Chat

https://huggingface.co/Intel/neural-chat-7b-v3-3

This model is a fine-tuned 7B parameter LLM on the Intel Gaudi 2 processor from the Intel/neural-chat-7b-v3-1 on the meta-math/MetaMathQA dataset. The model was aligned using the Direct Performance Optimization (DPO) method with Intel/orca_dpo_pairs. The Intel/neural-chat-7b-v3-1 was originally fine-tuned from mistralai/Mistral-7B-v-0.1. For more information, refer to the blog

The Practice of Supervised Fine-tuning and Direct Preference Optimization on Intel Gaudi2

deepseek-coder-6.7b-instruct

DeepSeek Coder is a capable coding model trained on two trillion code and natural language tokens.

Type: Code Generation
Use Case: Generating Computer Code or Answering Tech Questions
Prompt Format: Deepseek

https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.