Provider Slug.
vertex-ai
Obiguard SDK Integration with Google Vertex AI
Obiguard provides a consistent API to interact with models from various providers. To integrate Google Vertex AI with Obiguard:1. Install the Obiguard SDK
Add the Obiguard SDK to your application to interact with Google Vertex AI API through Obiguard’s gateway.2. Initialize Obiguard with the Virtual Key
To integrate Vertex AI with Obiguard, you’ll need yourVertex Project Id
Or Service Account JSON
& Vertex Region
, with which you can set up the Virtual key.
3. Invoke Chat Completions with Vertex AI and Gemini
Use the Obiguard instance to send requests to Gemini models hosted on Vertex AI. You can also override the virtual key directly in the API call if needed.Vertex AI uses OAuth2 to authenticate its requests, so you need to send the access token additionally along with the request.
To use Anthopic models on Vertex AI, prepend
Example:
Example:
anthropic.
to the model name.Example:
anthropic.claude-3-5-sonnet@20240620
Similarly, for Meta models, prepend meta.
to the model name.Example:
meta.llama-3-8b-8192
Document, Video, Audio Processing
Vertex AI supports attachingwebm
, mp4
, pdf
, jpg
, mp3
, wav
, etc. file types to your Gemini messages.
Gemini Docs:
Python
Extended Thinking (Reasoning Models) (Beta)
The assistants thinking response is returned in the
response_chunk.choices[0].delta.content_blocks
array, not the response.choices[0].message.content
string.Gemini models do not support plugging back the reasoning into multi turn conversations, so you don’t need to send the thinking message back to the model.google.gemini-2.5-flash-preview-04-17
anthropic.claude-3-7-sonnet@20250219
support extended thinking.
This is similar to openai thinking, but you get the model’s reasoning as it processes the request as well.
Note that you will have to set strict_open_ai_compliance=False
in the headers to use this feature.
Single turn conversation
Python
To disable thinking for gemini models like
google.gemini-2.5-flash-preview-04-17
, you are required to explicitly set budget_tokens
to 0
.Multi turn conversation
Python
Sending base64
Image
Here, you can send the base64
image data along with the url
field too:
This same message format also works for all other media types — just send your media file in the
url
field, like "url": "gs://cloud-samples-data/video/animals.mp4"
for google cloud urls and "url":"https://download.samplelib.com/mp3/sample-3s.mp3"
for public urlsYour URL should have the file extension, this is used for inferring MIME_TYPE
which is a required parameter for prompting Gemini models with filesText Embedding Models
You can use any of Vertex AI’sEnglish
and Multilingual
models through Obiguard, in the familar OpenAI-schema.
The Gemini-specific parameter
task_type
is also supported on Obiguard.Image Generation Models
Obiguard supports theImagen API
on Vertex AI for image generations, letting you easily make requests in the familar OpenAI-compliant schema.
Python
List of Supported Imagen Models
imagen-3.0-generate-001
imagen-3.0-fast-generate-001
imagegeneration@006
imagegeneration@005
imagegeneration@002
Grounding with Google Search
Vertex AI supports grounding with Google Search. This is a feature that allows you to ground your LLM responses with real-time search results. Grounding is invoked by passing thegoogle_search
tool (for newer models like gemini-2.0-flash-001), and google_search_retrieval
(for older models like gemini-1.5-flash) in the tools
array.
If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.
gemini-2.0-flash-thinking-exp and other thinking/reasoning models
gemini-2.0-flash-thinking-exp
models return a Chain of Thought response along with the actual inference text,
this is not openai compatible, however, Obiguard supports this by adding a \r\n\r\n
and appending the two responses together.
You can split the response along this pattern to get the Chain of Thought response and the actual inference text.
If you require the Chain of Thought response along with the actual inference text, pass the strict open ai compliance flag as false
in the request.
If you want to get the inference text only, pass the strict open ai compliance flag as true
in the request.
Making Requests Without Virtual Keys
You can also pass your Vertex AI details & secrets directly without using the Virtual Keys in Obiguard. Vertex AI expects aregion
, a project ID
and the access token
in the request for a successful completion request.
This is how you can specify these fields directly in your requests:
Example Request
How to Find Your Google Vertex Project Details
To obtain your Vertex Project ID and Region, navigate to Google Vertex Dashboard.- You can copy the Project ID located at the top left corner of your screen.
- Find the Region dropdown on the same page to get your Vertex Region.
Get Your Service Account JSON
- Follow this process to get your Service Account JSON.
- Upload your Google Cloud service account JSON file
- Specify the Vertex Region
aiplatform.endpoints.predict
permission to access custom endpoints.
Learn more about permission on your Vertex IAM key here.
For Self-Deployed Models: Your service account must have the
aiplatform.endpoints.predict
permission in Google Cloud IAM. Without this specific permission, requests to custom endpoints will fail.Using Project ID and Region Authentication
For standard Vertex AI models, you can simply provide:- Your Vertex Project ID (found in your Google Cloud console)
- The Vertex Region where your models are deployed