Chat Vision
When sending a request to the Vision models, Prediction Guard offers various options to upload your image. You can upload the image from using a URL, a local image file, data URI, or base64 encoded image. Here is an example of how to use an image from a URL:
1 import os 2 import json 3 from predictionguard import PredictionGuard 4 5 # Set your Prediction Guard token as an environmental variable. 6 os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>" 7 8 client = PredictionGuard() 9 10 messages = [ 11 { 12 "role": "user", 13 "content": [ 14 { 15 "type": "text", 16 "text": "What's in this image?" 17 }, 18 { 19 "type": "image_url", 20 "image_url": { 21 "url": "https://farm4.staticflickr.com/3300/3497460990_11dfb95dd1_z.jpg", 22 } 23 } 24 ] 25 }, 26 ] 27 28 result = client.chat.completions.create( 29 model="llava-1.5-7b-hf", 30 messages=messages 31 ) 32 33 print(json.dumps( 34 result, 35 sort_keys=True, 36 indent=4, 37 separators=(',', ': ') 38 ))
This example shows how you can upload the image from a local file:
1 import os 2 import json 3 from predictionguard import PredictionGuard 4 5 # Set your Prediction Guard token as an environmental variable. 6 os.environ["PREDICTIONGUARD_API_KEY"] = "<api key>" 7 8 client = PredictionGuard() 9 10 messages = [ 11 { 12 "role": "user", 13 "content": [ 14 { 15 "type": "text", 16 "text": "What's in this image?" 17 }, 18 { 19 "type": "image_url", 20 "image_url": { 21 "url": "3497460990_11dfb95dd1_z.jpg", 22 } 23 } 24 ] 25 }, 26 ] 27 28 result = client.chat.completions.create( 29 model="llava-1.5-7b-hf", 30 messages=messages 31 ) 32 33 print(json.dumps( 34 result, 35 sort_keys=True, 36 indent=4, 37 separators=(',', ': ') 38 ))
When using base64 encoded image inputs or data URIs, you first need to encode the image.
Here is how you convert an image to base64 encoding
1 import base64 2 3 def encode_image_to_base64(image_path): 4 with open(image_path, 'rb') as image_file: 5 image_data = image_file.read() 6 base64_encoded_data = base64.b64encode(image_data) 7 base64_message = base64_encoded_data.decode('utf-8') 8 return base64_message 9 10 image_path = '3497460990_11dfb95dd1_z.jpg' 11 encoded_image = encode_image_to_base64(image_path)
This example shows how to enter just the base64 encoded image:
1 messages = [ 2 { 3 "role": "user", 4 "content": [ 5 { 6 "type": "text", 7 "text": "What's in this image?" 8 }, 9 { 10 "type": "image_url", 11 "image_url": { 12 "url": encoded_image, 13 } 14 } 15 ] 16 }, 17 ] 18 19 result = client.chat.completions.create( 20 model="llava-1.5-7b-hf", 21 messages=messages 22 ) 23 24 print(json.dumps( 25 result, 26 sort_keys=True, 27 indent=4, 28 separators=(',', ': ') 29 ))
And this example shows how to use a data URI
1 data_uri = "data:image/png;base64," + encoded_string 2 3 messages = [ 4 { 5 "role": "user", 6 "content": [ 7 { 8 "type": "text", 9 "text": "What's in this image?" 10 }, 11 { 12 "type": "image_url", 13 "image_url": { 14 "url": data_uri, 15 } 16 } 17 ] 18 }, 19 ] 20 21 result = client.chat.completions.create( 22 model="llava-1.5-7b-hf", 23 messages=messages 24 ) 25 26 print(json.dumps( 27 result, 28 sort_keys=True, 29 indent=4, 30 separators=(',', ': ') 31 ))
The output of these will be similar to this:
1 { 2 "choices": [ 3 { 4 "index": 0, 5 "message": { 6 "content": "The scene depicts a man standing on a washing machine, positioned on the back end of a yellow car. He appears to be enjoying himself, while the car is driving down a street. \n\nThere are several other cars on the street. Near the center of the scene, another car can be seen parked, while two cars are found further in the background on both the left and right sides of the image. \n\nAdditionally, there are two more people", 7 "role": "assistant" 8 } 9 } 10 ], 11 "created": 1727889823, 12 "id": "chat-3f0f1b98-448a-4818-a7c4-a28f94eed05d", 13 "model": "llava-1.5-7b-hf", 14 "object": "chat.completion" 15 }