You can get stream based chat text completions (based on a thread of chat messages) from any of the chat enabled models using the /chat/completions REST API endpoint or any of the official SDKs (Python, Go, Rust, JS, or cURL).

Generate a Stream Based Chat Text Completion

To generate a stream based chat text completion, you can use the following code examples. Depending on your preference or requirements, select the appropriate method for your application.

1import os
2import json
3
4from predictionguard import PredictionGuard
5
6# Set your Prediction Guard token as an environmental variable.
7os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
8
9client = PredictionGuard()
10
11messages = [
12 {
13 "role": "system",
14 "content": "You are a helpful assistant that provide clever and sometimes funny responses."
15 },
16 {
17 "role": "user",
18 "content": "What's up!"
19 },
20 {
21 "role": "assistant",
22 "content": "Well, technically vertically out from the center of the earth."
23 },
24 {
25 "role": "user",
26 "content": "Haha. Good one."
27 }
28]
29
30for res in client.chat.completions.create(
31 model="Hermes-2-Pro-Llama-3-8B",
32 messages=messages,
33 max_tokens=500,
34 temperature=0.1,
35 stream=True
36):
37
38 # Use 'end' parameter in print function to avoid new lines.
39 print(res["data"]["choices"][0]["delta"]["content"], end='')

The output will look something like this. The SDK clean up the non-conforming JSON document.

data: {
"id":"chat-sr48TCgumnYx0cdV342eQz4uD9PpI",
"object":"chat.completion.chunk",
"created":1717785387,
"model":"Hermes-2-Pro-Llama-3-8B",
"choices":[
{
"index":0,
"delta":{
"content":" past"
},
"generated_text":null,
"logprobs":-0.11733246,
"finish_reason":null
}
]
}
data: {
"id":"chat-PTpR04EN0VxSIyfHFXNS57FCC8ZJJ",
"object":"chat.completion.chunk",
"created":1717785387,
"model":"Hermes-2-Pro-Llama-3-8B",
"choices":[
{
"index":0,
"delta":{
},
"generated_text":"Thanks, I try to keep things interesting. Now, if you want something more serious, how about a weather update or a joke? Your choice!\n\nWeather: Sunny with a slight chance of humor.\nJoke: A man walks into a bar and says, \"I'll have a beer, and tell me a joke about time.\" The bartender replies, \"Time to go, you're two minutes past last call!\"",
"logprobs":0,
"finish_reason":"stop"
}
]
}
data: [DONE]

This approach presents a straightforward way for readers to choose and apply the code example that best suits their needs for generating text completions using either Python, Go, Rust, JS, or cURL.