What are vision models?

Vision models are AI systems that integrate visual and language understanding, enabling them to interpret images alongside natural language text. They are trained on extensive datasets containing both images and text, with training methods varying based on their specific objectives.

Using Vision Chat Completion

Obiguard implements the OpenAI message format, allowing you to include images in API requests. You can provide images to the model either by supplying a URL or by embedding the image as a base64-encoded string.

Below is an example with OpenAI’s gpt-4o model:

from obiguard import Obiguard

client = Obiguard(
  obiguard_api_key='vk-obg***',  # Your Obiguard virtual key here
)

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[{
    "role": "user",
    "content": [
      {"type": "text", "text": "What's in this image?"},
      {
        "type": "image_url",
        "image_url": {
          "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
        },
      },
    ],
  }],
)

print(response)

Supported Providers and Models

Obiguard integrates with a wide range of vision models from leading providers. The table below lists some of the supported models.

ProviderModelsFunctions
OpenAIgpt-4-vision-preview, gpt-4o, gpt-4o-miniCreate Chat Completion
Azure OpenAIgpt-4-vision-preview, gpt-4o, gpt-4o-miniCreate Chat Completion
Geminigemini-1.0-pro-vision, gemini-1.5-flash, gemini-1.5-flash-8b, gemini-1.5-proCreate Chat Completion
Anthropicclaude-3-sonnet, claude-3-haiku, claude-3-opus, claude-3.5-sonnet, claude-3.5-haikuCreate Chat Completion
AWS Bedrockanthropic.claude-3-5-sonnet, anthropic.claude-3-5-haiku, anthropic.claude-3-5-sonnet-20240620-v1:0Create Chat Completion