Provider Slug.
vertex-aiObiguard SDK Integration with Google Vertex AI
Obiguard provides a consistent API to interact with models from various providers. To integrate Google Vertex AI with Obiguard:1. Install the Obiguard SDK
Add the Obiguard SDK to your application to interact with Google Vertex AI API through Obiguard’s gateway.- Python SDK
2. Initialize Obiguard with the Virtual Key
To integrate Vertex AI with Obiguard, you’ll need yourVertex Project Id Or Service Account JSON & Vertex Region, with which you can set up the Virtual key.
- Python SDK
3. Invoke Chat Completions with Vertex AI and Gemini
Use the Obiguard instance to send requests to Gemini models hosted on Vertex AI. You can also override the virtual key directly in the API call if needed.Vertex AI uses OAuth2 to authenticate its requests, so you need to send the access token additionally along with the request.
- Python SDK
To use Anthopic models on Vertex AI, prepend
Example:
Example:
anthropic. to the model name.Example:
anthropic.claude-3-5-sonnet@20240620Similarly, for Meta models, prepend meta. to the model name.Example:
meta.llama-3-8b-8192Document, Video, Audio Processing
Vertex AI supports attachingwebm, mp4, pdf, jpg, mp3, wav, etc. file types to your Gemini messages.
Gemini Docs:
- Python SDK
- OpenAI SDK
- cURL
Python
Extended Thinking (Reasoning Models) (Beta)
The assistants thinking response is returned in the
response_chunk.choices[0].delta.content_blocks array, not the response.choices[0].message.content string.Gemini models do not support plugging back the reasoning into multi turn conversations, so you don’t need to send the thinking message back to the model.google.gemini-2.5-flash-preview-04-17 anthropic.claude-3-7-sonnet@20250219 support extended thinking.
This is similar to openai thinking, but you get the model’s reasoning as it processes the request as well.
Note that you will have to set strict_open_ai_compliance=False in the headers to use this feature.
Single turn conversation
- Python SDK
- OpenAI SDK
- cURL
Python
To disable thinking for gemini models like
google.gemini-2.5-flash-preview-04-17, you are required to explicitly set budget_tokens to 0.Multi turn conversation
- Python SDK
- OpenAI SDK
- cURL
Python
Sending base64 Image
Here, you can send the base64 image data along with the url field too:
This same message format also works for all other media types — just send your media file in the
url field, like "url": "gs://cloud-samples-data/video/animals.mp4" for google cloud urls and "url":"https://download.samplelib.com/mp3/sample-3s.mp3" for public urlsYour URL should have the file extension, this is used for inferring MIME_TYPE which is a required parameter for prompting Gemini models with filesText Embedding Models
You can use any of Vertex AI’sEnglish and Multilingual models through Obiguard, in the familar OpenAI-schema.
The Gemini-specific parameter
task_type is also supported on Obiguard.- Python SDK
- cURL
Image Generation Models
Obiguard supports theImagen API on Vertex AI for image generations, letting you easily make requests in the familar OpenAI-compliant schema.
- Python SDK
- cURL
Python
List of Supported Imagen Models
imagen-3.0-generate-001imagen-3.0-fast-generate-001imagegeneration@006imagegeneration@005imagegeneration@002
Grounding with Google Search
Vertex AI supports grounding with Google Search. This is a feature that allows you to ground your LLM responses with real-time search results. Grounding is invoked by passing thegoogle_search tool (for newer models like gemini-2.0-flash-001), and google_search_retrieval
(for older models like gemini-1.5-flash) in the tools array.
If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.
gemini-2.0-flash-thinking-exp and other thinking/reasoning models
gemini-2.0-flash-thinking-exp models return a Chain of Thought response along with the actual inference text,
this is not openai compatible, however, Obiguard supports this by adding a \r\n\r\n and appending the two responses together.
You can split the response along this pattern to get the Chain of Thought response and the actual inference text.
If you require the Chain of Thought response along with the actual inference text, pass the strict open ai compliance flag as false in the request.
If you want to get the inference text only, pass the strict open ai compliance flag as true in the request.
Making Requests Without Virtual Keys
You can also pass your Vertex AI details & secrets directly without using the Virtual Keys in Obiguard. Vertex AI expects aregion, a project ID and the access token in the request for a successful completion request.
This is how you can specify these fields directly in your requests:
Example Request
- Python SDK
- cURL
How to Find Your Google Vertex Project Details
To obtain your Vertex Project ID and Region, navigate to Google Vertex Dashboard.- You can copy the Project ID located at the top left corner of your screen.
- Find the Region dropdown on the same page to get your Vertex Region.
Get Your Service Account JSON
- Follow this process to get your Service Account JSON.
- Upload your Google Cloud service account JSON file
- Specify the Vertex Region
aiplatform.endpoints.predict permission to access custom endpoints.
Learn more about permission on your Vertex IAM key here.
For Self-Deployed Models: Your service account must have the
aiplatform.endpoints.predict permission in Google Cloud IAM. Without this specific permission, requests to custom endpoints will fail.Using Project ID and Region Authentication
For standard Vertex AI models, you can simply provide:- Your Vertex Project ID (found in your Google Cloud console)
- The Vertex Region where your models are deployed

