Chat SSE

You can get stream based chat text completions (based on a thread of chat messages) from any of the chat enabled models using the /chat/completions REST API endpoint or any of the official SDKs (Python, Go, Rust, JS, or cURL).

Generate a Stream Based Chat Text Completion

To generate a stream based chat text completion, you can use the following code examples. Depending on your preference or requirements, select the appropriate method for your application.

1 import os
2 import json
3 
4 from predictionguard import PredictionGuard
5 
6 # Set your Prediction Guard token as an environmental variable.
7 os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
8 
9 client = PredictionGuard()
10 
11 messages = [
12     {
13         "role": "system",
14         "content": "You are a helpful assistant that provide clever and sometimes funny responses."
15     },
16     {
17         "role": "user",
18         "content": "What's up!"
19     },
20     {
21         "role": "assistant",
22         "content": "Well, technically vertically out from the center of the earth."
23     },
24     {
25         "role": "user",
26         "content": "Haha. Good one."
27     }
28 ]
29 
30 for res in client.chat.completions.create(
31     model="Neural-Chat-7B",
32     messages=messages,
33     max_tokens=500,
34     temperature=0.1,
35     stream=True
36 ):
37     
38     # Use 'end' parameter in print function to avoid new lines.
39     print(res["data"]["choices"][0]["delta"]["content"], end='')

The output will look something like this. The SDK clean up the non-conforming JSON document.

data: {
   "id":"chat-sr48TCgumnYx0cdV342eQz4uD9PpI",
   "object":"chat.completion.chunk",
   "created":1717785387,
   "model":"Neural-Chat-7B",
   "choices":[
      {
         "index":0,
         "delta":{
            "content":" past"
         },
         "generated_text":null,
         "logprobs":-0.11733246,
         "finish_reason":null
      }
   ]
}
data: {
   "id":"chat-PTpR04EN0VxSIyfHFXNS57FCC8ZJJ",
   "object":"chat.completion.chunk",
   "created":1717785387,
   "model":"Neural-Chat-7B",
   "choices":[
      {
         "index":0,
         "delta":{
            
         },
         "generated_text":"Thanks, I try to keep things interesting. Now, if you want something more serious, how about a weather update or a joke? Your choice!\n\nWeather: Sunny with a slight chance of humor.\nJoke: A man walks into a bar and says, \"I'll have a beer, and tell me a joke about time.\" The bartender replies, \"Time to go, you're two minutes past last call!\"",
         "logprobs":0,
         "finish_reason":"stop"
      }
   ]
}
data: [DONE]

This approach presents a straightforward way for readers to choose and apply the code example that best suits their needs for generating text completions using either Python, Go, Rust, JS, or cURL.

1	import os
2	import json
3
4	from predictionguard import PredictionGuard
5
6	# Set your Prediction Guard token as an environmental variable.
7	os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
8
9	client = PredictionGuard()
10
11	messages = [
12	{
13	"role": "system",
14	"content": "You are a helpful assistant that provide clever and sometimes funny responses."
15	},
16	{
17	"role": "user",
18	"content": "What's up!"
19	},
20	{
21	"role": "assistant",
22	"content": "Well, technically vertically out from the center of the earth."
23	},
24	{
25	"role": "user",
26	"content": "Haha. Good one."
27	}
28	]
29
30	for res in client.chat.completions.create(
31	model="Neural-Chat-7B",
32	messages=messages,
33	max_tokens=500,
34	temperature=0.1,
35	stream=True
36	):
37
38	# Use 'end' parameter in print function to avoid new lines.
39	print(res["data"]["choices"][0]["delta"]["content"], end='')