Chat Vision

When sending a request to the Vision models, Prediction Guard offers various options to upload your image. You can upload the image from using a URL, a local image file, data URI, or base64 encoded image. Here is an example of how to use an image from a URL:

1import os
2import json
3from predictionguard import PredictionGuard
4
5# Set your Prediction Guard token as an environmental variable.
6os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
7
8client = PredictionGuard()
9
10messages = [
11 {
12 "role": "user",
13 "content": [
14 {
15 "type": "text",
16 "text": "What's in this image?"
17 },
18 {
19 "type": "image_url",
20 "image_url": {
21 "url": "https://pbs.twimg.com/media/GKLN4qPXEAArqoK.png",
22 }
23 }
24 ]
25 },
26]
27
28result = client.chat.completions.create(
29 model="llava-1.5-7b-hf",
30 messages=messages
31)
32
33print(json.dumps(
34 result,
35 sort_keys=True,
36 indent=4,
37 separators=(',', ': ')
38))

This example shows how you can upload the image from a local file:

1import os
2import json
3from predictionguard import PredictionGuard
4
5# Set your Prediction Guard token as an environmental variable.
6os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>"
7
8client = PredictionGuard()
9
10messages = [
11 {
12 "role": "user",
13 "content": [
14 {
15 "type": "text",
16 "text": "What's in this image?"
17 },
18 {
19 "type": "image_url",
20 "image_url": {
21 "url": "GKLN4qPXEAArqoK.png",
22 }
23 }
24 ]
25 },
26]
27
28result = client.chat.completions.create(
29 model="llava-1.5-7b-hf",
30 messages=messages
31)
32
33print(json.dumps(
34 result,
35 sort_keys=True,
36 indent=4,
37 separators=(',', ': ')
38))

When using base64 encoded image inputs or data URIs, you first need to encode the image.

Here is how you convert an image to base64 encoding

1import base64
2
3def encode_image_to_base64(image_path):
4 with open(image_path, 'rb') as image_file:
5 image_data = image_file.read()
6 base64_encoded_data = base64.b64encode(image_data)
7 base64_message = base64_encoded_data.decode('utf-8')
8 return base64_message
9
10image_path = 'GKLN4qPXEAArqoK.png'
11encoded_image = encode_image_to_base64(image_path)

This example shows how to enter just the base64 encoded image:

1messages = [
2 {
3 "role": "user",
4 "content": [
5 {
6 "type": "text",
7 "text": "What's in this image?"
8 },
9 {
10 "type": "image_url",
11 "image_url": {
12 "url": encoded_image,
13 }
14 }
15 ]
16 },
17]
18
19result = client.chat.completions.create(
20 model="llava-1.5-7b-hf",
21 messages=messages
22)
23
24print(json.dumps(
25 result,
26 sort_keys=True,
27 indent=4,
28 separators=(',', ': ')
29))

And this example shows how to use a data URI

1data_uri = "data:image/png;base64," + encoded_string
2
3messages = [
4 {
5 "role": "user",
6 "content": [
7 {
8 "type": "text",
9 "text": "What's in this image?"
10 },
11 {
12 "type": "image_url",
13 "image_url": {
14 "url": data_uri,
15 }
16 }
17 ]
18 },
19]
20
21result = client.chat.completions.create(
22 model="llava-1.5-7b-hf",
23 messages=messages
24)
25
26print(json.dumps(
27 result,
28 sort_keys=True,
29 indent=4,
30 separators=(',', ': ')
31))

The output of these will be similar to this:

1{
2 "choices": [
3 {
4 "index": 0,
5 "message": {
6 "content": "The scene depicts a man standing on a washing machine, positioned on the back end of a yellow car. He appears to be enjoying himself, while the car is driving down a street. \n\nThere are several other cars on the street. Near the center of the scene, another car can be seen parked, while two cars are found further in the background on both the left and right sides of the image. \n\nAdditionally, there are two more people",
7 "role": "assistant"
8 }
9 }
10 ],
11 "created": 1727889823,
12 "id": "chat-3f0f1b98-448a-4818-a7c4-a28f94eed05d",
13 "model": "llava-1.5-7b-hf",
14 "object": "chat.completion"
15}