Google Vertex AI

Obiguard provides a robust and secure gateway to facilitate the integration of various Large Language Models (LLMs), and embedding models into your apps, including Google Vertex AI. With Obiguard, you can take advantage of features like fast AI gateway access, observability, prompt management, and more, all while ensuring the secure management of your Vertex auth through a virtual key system

Provider Slug. vertex-ai

Obiguard SDK Integration with Google Vertex AI

Obiguard provides a consistent API to interact with models from various providers. To integrate Google Vertex AI with Obiguard:

1. Install the Obiguard SDK

Add the Obiguard SDK to your application to interact with Google Vertex AI API through Obiguard’s gateway.

Python SDK

pip install obiguard

2. Initialize Obiguard with the Virtual Key

To integrate Vertex AI with Obiguard, you’ll need your Vertex Project Id Or Service Account JSON & Vertex Region, with which you can set up the Virtual key.

Python SDK

from obiguard import Obiguard

client = Obiguard(
  obiguard_api_key="OBIGUARD_API_KEY",  # Your Obiguard API key
  virtual_key="VERTEX_VIRTUAL_KEY"   # Replace with your virtual key for Google
)

3. Invoke Chat Completions with Vertex AI and Gemini

Use the Obiguard instance to send requests to Gemini models hosted on Vertex AI. You can also override the virtual key directly in the API call if needed.

Vertex AI uses OAuth2 to authenticate its requests, so you need to send the access token additionally along with the request.

Python SDK

completion = client.with_options(Authorization="Bearer $YOUR_VERTEX_ACCESS_TOKEN").chat.completions.create(
    messages= [{ "role": 'user', "content": 'Say this is a test' }],
    model= 'gemini-1.5-pro-latest'
)

print(completion)

To use Anthopic models on Vertex AI, prepend anthropic. to the model name.
Example: anthropic.claude-3-5-sonnet@20240620Similarly, for Meta models, prepend meta. to the model name.
Example: meta.llama-3-8b-8192

Document, Video, Audio Processing

Vertex AI supports attaching webm, mp4, pdf, jpg, mp3, wav, etc. file types to your Gemini messages.

Gemini Docs:

Using Obiguard, here’s how you can send these media files:

Python SDK
OpenAI SDK
cURL

Python

completion = client.chat.completions.create(
  messages=[
    {
      "role": "system",
      "content": "You are a helpful assistant"
    },
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "gs://cloud-samples-data/generative-ai/image/scones.jpg"
          }
        },
        {
          "type": "text",
          "text": "Describe the image"
        }
      ]
    }
  ],
  model='gemini-1.5-pro-001',
  max_tokens=200
)
print(completion)

cURL

curl --location 'https://gateway.obiguard.ai/v1/chat/completions' \
  --header 'x-obiguard-provider: vertex-ai' \
  --header 'x-obiguard-vertex-region: us-central1' \
  --header 'Content-Type: application/json' \
  --header 'x-obiguard-api-key: $OBIGUARD_API_KEY' \
  --header 'Authorization: GEMINI_API_KEY' \
  --data '{
    "model": "gemini-1.5-pro-001",
    "max_tokens": 200,
    "stream": false,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant"
      },
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "gs://cloud-samples-data/generative-ai/image/scones.jpg"
            }
          },
          {
           "type": "text",
           "text": "describe this image"
          }
        ]
      }
    ]
  }'

Extended Thinking (Reasoning Models) (Beta)

The assistants thinking response is returned in the response_chunk.choices[0].delta.content_blocks array, not the response.choices[0].message.content string.Gemini models do not support plugging back the reasoning into multi turn conversations, so you don’t need to send the thinking message back to the model.

Models like google.gemini-2.5-flash-preview-04-17 anthropic.claude-3-7-sonnet@20250219 support extended thinking. This is similar to openai thinking, but you get the model’s reasoning as it processes the request as well. Note that you will have to set strict_open_ai_compliance=False in the headers to use this feature.

Single turn conversation

Python SDK
OpenAI SDK
cURL

Python

from obiguard import Obiguard

client = Obiguard(
  obiguard_api_key="OBIGUARD_API_KEY",  # Your Obiguard API key
  strict_open_ai_compliance=False
)

# Create the request
response = client.chat.completions.create(
  model="anthropic.claude-3-7-sonnet@20250219",
  max_tokens=3000,
  thinking={
    "type": "enabled",
    "budget_tokens": 2030
  },
  stream=True,
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
        }
      ]
   }
  ]
)
print(response)
# in case of streaming responses you'd have to parse the response_chunk.choices[0].delta.content_blocks array
# response = client.chat.completions.create(
#   ...same config as above but with stream: true
# )
# for chunk in response:
#     if chunk.choices[0].delta:
#         content_blocks = chunk.choices[0].delta.get("content_blocks")
#         if content_blocks is not None:
#             for content_block in content_blocks:
#                 print(content_block)

OpenAI Python

from openai import OpenAI
from obiguard import OBIGUARD_GATEWAY_URL, createHeaders

client = OpenAI(
  api_key='VERTEX_API_KEY',
  base_url=OBIGUARD_GATEWAY_URL,
  default_headers=createHeaders(
    provider="vertex-ai",
    obioguard_api_key="sk-obg******",  # Replace with your Obiguard API key
    strict_open_ai_compliance=False
  )
)

response = openai.chat.completions.create(
  model="anthropic.claude-3-7-sonnet@20250219",
  max_tokens=3000,
  thinking={
    "type": "enabled",
    "budget_tokens": 2030
  },
  stream=True,
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
        }
      ]
    }
  ]
)
print(response)

cURL

curl "https://gateway.obiguard.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "x-obiguard-api-key: $OBIGUARD_API_KEY" \
-H "x-obiguard-provider: vertex-ai" \
-H "x-api-key: $VERTEX_API_KEY" \
-H "x-obiguard-strict-open-ai-compliance: false" \
-d '{
  "model": "anthropic.claude-3-7-sonnet@20250219",
  "max_tokens": 3000,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 2030
  },
  "stream": true,
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
        }
      ]
    }
  ]
}'

To disable thinking for gemini models like google.gemini-2.5-flash-preview-04-17, you are required to explicitly set budget_tokens to 0.

"thinking": {
    "type": "enabled",
    "budget_tokens": 0
}

Multi turn conversation

Python SDK
OpenAI SDK
cURL

Python

from obiguard import Obiguard

client = Obiguard(
  obiguard_api_key="OBIGUARD_API_KEY",  # Your Obiguard API key
  virtual_key="VIRTUAL_KEY",   # Add your provider's virtual key
  strict_open_ai_compliance=False
)

response = client.chat.completions.create(
  model="anthropic.claude-3-7-sonnet@20250219",
  max_tokens=3000,
  thinking={
    "type": "enabled",
    "budget_tokens": 2030
  },
  stream=True,
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "when does the flight from baroda to bangalore land tomorrow, what time, what is its flight number, and what is its baggage belt?"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "thinking",
          "thinking": "The user is asking several questions about a flight from Baroda (also known as Vadodara) to Bangalore:\n1. When does the flight land tomorrow\n2. What time does it land\n3. What is the flight number\n4. What is the baggage belt number at the arrival airport\n\nTo properly answer these questions, I would need access to airline flight schedules and airport information systems. However, I don't have:\n- Real-time or scheduled flight information\n- Access to airport baggage claim allocation systems\n- Information about specific flights between these cities\n- The ability to look up tomorrow's specific flight schedules\n\nThis question requires current, specific flight information that I don't have access to. Instead of guessing or providing potentially incorrect information, I should explain this limitation and suggest ways the user could find this information.",
          "signature": "EqoBCkgIARABGAIiQBVA7FBNLRtWarDSy9TAjwtOpcTSYHJ+2GYEoaorq3V+d3eapde04bvEfykD/66xZXjJ5yyqogJ8DEkNMotspRsSDKzuUJ9FKhSNt/3PdxoMaFZuH+1z1aLF8OeQIjCrA1+T2lsErrbgrve6eDWeMvP+1sqVqv/JcIn1jOmuzrPi2tNz5M0oqkOO9txJf7QqEPPw6RG3JLO2h7nV1BMN6wE="
        }
      ]
    },
    {
      "role": "user",
      "content": "thanks that's good to know, how about to chennai?"
    }
  ]
)
print(response)

OpenAI Python

from openai import OpenAI
from obiguard import OBIGUARD_GATEWAY_URL, createHeaders

client = OpenAI(
  api_key='Anthropic_API_KEY',
  base_url=OBIGUARD_GATEWAY_URL,
  default_headers=createHeaders(
    provider="vertex-ai",
    obiguard_api_key="OBIGUARD_API_KEY"
    strict_open_ai_compliance=False
  )
)
response = client.chat.completions.create(
  model="anthropic.claude-3-7-sonnet@20250219",
  max_tokens=3000,
  thinking={
    "type": "enabled",
    "budget_tokens": 2030
  },
  stream=True,
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "when does the flight from baroda to bangalore land tomorrow, what time, what is its flight number, and what is its baggage belt?"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "thinking",
          "thinking": "The user is asking several questions about a flight from Baroda (also known as Vadodara) to Bangalore:\n1. When does the flight land tomorrow\n2. What time does it land\n3. What is the flight number\n4. What is the baggage belt number at the arrival airport\n\nTo properly answer these questions, I would need access to airline flight schedules and airport information systems. However, I don't have:\n- Real-time or scheduled flight information\n- Access to airport baggage claim allocation systems\n- Information about specific flights between these cities\n- The ability to look up tomorrow's specific flight schedules\n\nThis question requires current, specific flight information that I don't have access to. Instead of guessing or providing potentially incorrect information, I should explain this limitation and suggest ways the user could find this information.",
          signature: "EqoBCkgIARABGAIiQBVA7FBNLRtWarDSy9TAjwtOpcTSYHJ+2GYEoaorq3V+d3eapde04bvEfykD/66xZXjJ5yyqogJ8DEkNMotspRsSDKzuUJ9FKhSNt/3PdxoMaFZuH+1z1aLF8OeQIjCrA1+T2lsErrbgrve6eDWeMvP+1sqVqv/JcIn1jOmuzrPi2tNz5M0oqkOO9txJf7QqEPPw6RG3JLO2h7nV1BMN6wE="
        }
      ]
    },
    {
      "role": "user",
      "content": "thanks that's good to know, how about to chennai?"
    }
  ]
)

print(response)

cURL

curl "https://gateway.obiguard.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "x-obiguard-api-key: $OBIGUARD_API_KEY" \
  -H "x-obiguard-provider: vertex-ai" \
  -H "x-api-key: $VERTEX_API_KEY" \
  -H "x-obiguard-strict-open-ai-compliance: false" \
  -d '{
      "model": "anthropic.claude-3-7-sonnet@20250219",
      "max_tokens": 3000,
      "thinking": {
        "type": "enabled",
        "budget_tokens": 2030
      },
      "stream": true,
      "messages": [
        {
          "role": "user",
          "content": [
        {
          "type": "text",
          "text": "when does the flight from baroda to bangalore land tomorrow, what time, what is its flight number, and what is its baggage belt?"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "thinking",
          "thinking": "The user is asking several questions about a flight from Baroda (also known as Vadodara) to Bangalore:\n1. When does the flight land tomorrow\n2. What time does it land\n3. What is the flight number\n4. What is the baggage belt number at the arrival airport\n\nTo properly answer these questions, I would need access to airline flight schedules and airport information systems. However, I don't have:\n- Real-time or scheduled flight information\n- Access to airport baggage claim allocation systems\n- Information about specific flights between these cities\n- The ability to look up tomorrow's specific flight schedules\n\nThis question requires current, specific flight information that I don't have access to. Instead of guessing or providing potentially incorrect information, I should explain this limitation and suggest ways the user could find this information.",
          "signature": "EqoBCkgIARABGAIiQBVA7FBNLRtWarDSy9TAjwtOpcTSYHJ+2GYEoaorq3V+d3eapde04bvEfykD/66xZXjJ5yyqogJ8DEkNMotspRsSDKzuUJ9FKhSNt/3PdxoMaFZuH+1z1aLF8OeQIjCrA1+T2lsErrbgrve6eDWeMvP+1sqVqv/JcIn1jOmuzrPi2tNz5M0oqkOO9txJf7QqEPPw6RG3JLO2h7nV1BMN6wE="
        }
      ]
    },
    {
      "role": "user",
      "content": "thanks that's good to know, how about to chennai?"
    }
  ]
}'

Sending `base64` Image

Here, you can send the base64 image data along with the url field too:

"url": "data:image/png;base64,UklGRkacAABXRUJQVlA4IDqcAAC....."

This same message format also works for all other media types — just send your media file in the url field, like "url": "gs://cloud-samples-data/video/animals.mp4" for google cloud urls and "url":"https://download.samplelib.com/mp3/sample-3s.mp3" for public urlsYour URL should have the file extension, this is used for inferring MIME_TYPE which is a required parameter for prompting Gemini models with files

Text Embedding Models

You can use any of Vertex AI’s English and Multilingual models through Obiguard, in the familar OpenAI-schema.

The Gemini-specific parameter task_type is also supported on Obiguard.

Python SDK
cURL

from obiguard import Obiguard

client = Obiguard(
  obiguard_api_key="OBIGUARD_API_KEY",  # Your Obiguard API key
  virtual_key="VERTEX_VIRTUAL_KEY"
)

# Generate embeddings
def get_embeddings():
    embeddings = client.with_options(Authorization="Bearer $YOUR_VERTEX_ACCESS_TOKEN").embeddings.create(
        input='The vector representation for this text',
        model='text-embedding-004',
        task_type="CLASSIFICATION" # Optional
    )
    print(embeddings)

get_embeddings()

 curl 'https://gateway.obiguard.ai/v1/embeddings' \
    -H 'Content-Type: application/json' \
    -H 'x-obiguard-api-key: $OBIGUARD_API_KEY' \
    -H 'x-obiguard-provider: vertex-ai' \
    -H 'Authorization: Bearer VERTEX_AI_ACCESS_TOKEN' \
    -H 'x-obiguard-virtual-key: $VERTEX_VIRTUAL_KEY' \
    --data-raw '{
        "model": "textembedding-004",
        "input": "A HTTP 246 code is used to signify an AI response containing hallucinations or other inaccuracies",
        "task_type": "CLASSIFICATION"
    }'

Image Generation Models

Obiguard supports the Imagen API on Vertex AI for image generations, letting you easily make requests in the familar OpenAI-compliant schema.

Python SDK
cURL

Python

from obiguard import Obiguard

client = Obiguard(
 obiguard_api_key="OBIGUARD_API_KEY",  # Your Obiguard API key
)

client.images.generate(
  prompt = "Cat flying to mars from moon",
  model = "imagen-3.0-generate-001"
)

cURL

curl https://gateway.obiguard.ai/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "x-obiguard-api-key: $OBIGUARD_VIRTUAL_KEY" \
  -d '{
    "prompt": "Cat flying to mars from moon",
    "model":"imagen-3.0-generate-001"
  }'

Image Generation API Reference

List of Supported Imagen Models

imagen-3.0-generate-001
imagen-3.0-fast-generate-001
imagegeneration@006
imagegeneration@005
imagegeneration@002

Grounding with Google Search

Vertex AI supports grounding with Google Search. This is a feature that allows you to ground your LLM responses with real-time search results. Grounding is invoked by passing the google_search tool (for newer models like gemini-2.0-flash-001), and google_search_retrieval (for older models like gemini-1.5-flash) in the tools array.

"tools": [
    {
        "type": "function",
        "function": {
            "name": "google_search" // or google_search_retrieval for older models
        }
    }]

If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.

gemini-2.0-flash-thinking-exp and other thinking/reasoning models

gemini-2.0-flash-thinking-exp models return a Chain of Thought response along with the actual inference text, this is not openai compatible, however, Obiguard supports this by adding a \r\n\r\n and appending the two responses together. You can split the response along this pattern to get the Chain of Thought response and the actual inference text. If you require the Chain of Thought response along with the actual inference text, pass the strict open ai compliance flag as false in the request. If you want to get the inference text only, pass the strict open ai compliance flag as true in the request.

Making Requests Without Virtual Keys

You can also pass your Vertex AI details & secrets directly without using the Virtual Keys in Obiguard. Vertex AI expects a region, a project ID and the access token in the request for a successful completion request. This is how you can specify these fields directly in your requests:

Example Request

Python SDK
cURL

from obiguard import Obiguard

client = Obiguard(
    obiguard_api_key="OBIGUARD_API_KEY",  # Your Obiguard API key
    vertex_project_id="sample-55646",
    vertex_region="us-central1",
    provider="vertex_ai",
    Authorization="$GCLOUD AUTH PRINT-ACCESS-TOKEN"
)

completion = client.chat.completions.create(
    messages= [{ "role": 'user', "content": 'Say this is a test' }],
    model= 'gemini-1.5-pro-latest'
)

print(completion)

curl 'https://gateway.obiguard.ai/v1/chat/completions' \
  -H 'Content-Type: application/json' \
  -H 'x-obiguard-api-key: $OBIGUARD_API_KEY' \
  -H 'x-obiguard-provider: vertex-ai' \
  -H 'Authorization: Bearer VERTEX_AI_ACCESS_TOKEN' \
  -H 'x-obiguard-vertex-project-id: sample-94994' \
  -H 'x-obiguard-vertex-region: us-central1' \
  --data '{
      "model": "gemini-1.5-pro",
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant"
        },
        {
          "role": "user",
          "content": "what is a deer?"
        }
      ]
  }'

For further questions on custom Vertex AI deployments or fine-grained access tokens, reach out to us on [email protected].

How to Find Your Google Vertex Project Details

To obtain your Vertex Project ID and Region, navigate to Google Vertex Dashboard.

You can copy the Project ID located at the top left corner of your screen.
Find the Region dropdown on the same page to get your Vertex Region.

Get Your Service Account JSON

Follow this process to get your Service Account JSON.

When selecting Service Account File as your authentication method, you’ll need to:

Upload your Google Cloud service account JSON file
Specify the Vertex Region

This method is particularly important for using self-deployed models, as your service account must have the aiplatform.endpoints.predict permission to access custom endpoints. Learn more about permission on your Vertex IAM key here.

For Self-Deployed Models: Your service account must have the aiplatform.endpoints.predict permission in Google Cloud IAM. Without this specific permission, requests to custom endpoints will fail.

Using Project ID and Region Authentication

For standard Vertex AI models, you can simply provide:

Your Vertex Project ID (found in your Google Cloud console)
The Vertex Region where your models are deployed

This method is simpler but may not have all the permissions needed for custom endpoints.

Next Steps

The complete list of features supported in the SDK are available on the link below.

Ecosystem

AI Apps

Agents

LLMs

Vector Databases

Obiguard SDK Integration with Google Vertex AI

1. Install the Obiguard SDK

2. Initialize Obiguard with the Virtual Key

3. Invoke Chat Completions with Vertex AI and Gemini

Document, Video, Audio Processing

Extended Thinking (Reasoning Models) (Beta)

Single turn conversation

Multi turn conversation

Sending `base64` Image

Text Embedding Models

Image Generation Models

List of Supported Imagen Models

Grounding with Google Search

gemini-2.0-flash-thinking-exp and other thinking/reasoning models

Making Requests Without Virtual Keys

Example Request

How to Find Your Google Vertex Project Details

Get Your Service Account JSON

Using Project ID and Region Authentication

Next Steps

SDK

Ecosystem

AI Apps

Agents

LLMs

Vector Databases

​Obiguard SDK Integration with Google Vertex AI

​1. Install the Obiguard SDK

​2. Initialize Obiguard with the Virtual Key

​3. Invoke Chat Completions with Vertex AI and Gemini

​Document, Video, Audio Processing

​Extended Thinking (Reasoning Models) (Beta)

​Single turn conversation

​Multi turn conversation

​Sending base64 Image

​Text Embedding Models

​Image Generation Models

​List of Supported Imagen Models

​Grounding with Google Search

​gemini-2.0-flash-thinking-exp and other thinking/reasoning models

​Making Requests Without Virtual Keys

​Example Request

​How to Find Your Google Vertex Project Details

​Get Your Service Account JSON

​Using Project ID and Region Authentication

​Next Steps

SDK

Obiguard SDK Integration with Google Vertex AI

1. Install the Obiguard SDK

2. Initialize Obiguard with the Virtual Key

3. Invoke Chat Completions with Vertex AI and Gemini

Document, Video, Audio Processing

Extended Thinking (Reasoning Models) (Beta)

Single turn conversation

Multi turn conversation

Sending `base64` Image

Text Embedding Models

Image Generation Models

List of Supported Imagen Models

Grounding with Google Search

gemini-2.0-flash-thinking-exp and other thinking/reasoning models

Making Requests Without Virtual Keys

Example Request

How to Find Your Google Vertex Project Details

Get Your Service Account JSON

Using Project ID and Region Authentication

Next Steps