Obiguard provides a robust and secure gateway to facilitate the integration of various Large Language Models (LLMs) into your applications, including Google Gemini APIs.
With Obiguard, you can take advantage of features like fast AI gateway access, observability, prompt management, and more, all while ensuring the secure management of your LLM API keys through a virtual key system.
Use the Obiguard instance to send requests to Google Gemini. You can also override the virtual key directly in the API call if needed.
Copy
completion = client.chat.completions.create( messages= [ { "role": 'system', "content": 'You are not a helpful assistant' }, { "role": 'user', "content": 'Say this is a test' } ], model= 'gemini-1.5-pro')print(completion)
Copy
completion = client.chat.completions.create( messages= [ { "role": 'system', "content": 'You are not a helpful assistant' }, { "role": 'user', "content": 'Say this is a test' } ], model= 'gemini-1.5-pro')print(completion)
Obiguard supports the system_instructions parameter for Google Gemini 1.5 - allowing you to control the behavior and output of your Gemini-powered applications with ease.
Simply include your Gemini system prompt as part of the {"role":"system"} message within the messages array of your request body.
Obiguard Gateway will automatically transform your message to ensure seamless compatibility with the Google Gemini API.
This same message format also works for all other media types — just send your media file in the url field, like "url": "gs://cloud-samples-data/video/animals.mp4".
Your URL should have the file extension, this is used for inferring MIME_TYPE which is a required parameter for prompting Gemini models with files.
Vertex AI supports grounding with Google Search. This is a feature that allows you to ground your LLM responses with real-time search results.
Grounding is invoked by passing the google_search tool (for newer models like gemini-2.0-flash-001), and google_search_retrieval (for older models like gemini-1.5-flash) in the tools array.
Copy
"tools": [ { "type": "function", "function": { "name": "google_search" // or google_search_retrieval for older models } }]
If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.
The assistants thinking response is returned in the response_chunk.choices[0].delta.content_blocks array, not the response.choices[0].message.content string.
Models like gemini-2.5-flash-preview-04-17gemini-2.5-flash-preview-04-17 support extended thinking.
This is similar to openai thinking, but you get the model’s reasoning as it processes the request as well.
from obiguard import Obiguardclient = Obiguard( obiguard_api_key="vk-obg***", # Your Obiguard virtual key strict_open_ai_compliance=False)# Create the requestresponse = client.chat.completions.create( model="gemini-2.5-flash-preview-04-17", max_tokens=3000, thinking={ "type": "enabled", "budget_tokens": 2030 }, stream=True, messages=[ { "role": "user", "content": [ { "type": "text", "text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?" } ] } ])print(response)# in case of streaming responses you'd have to parse the response_chunk.choices[0].delta.content_blocks array# response = client.chat.completions.create(# ...same config as above but with stream: true# )# for chunk in response:# if chunk.choices[0].delta:# content_blocks = chunk.choices[0].delta.get("content_blocks")# if content_blocks is not None:# for content_block in content_blocks:# print(content_block)
Python
Copy
from obiguard import Obiguardclient = Obiguard( obiguard_api_key="vk-obg***", # Your Obiguard virtual key strict_open_ai_compliance=False)# Create the requestresponse = client.chat.completions.create( model="gemini-2.5-flash-preview-04-17", max_tokens=3000, thinking={ "type": "enabled", "budget_tokens": 2030 }, stream=True, messages=[ { "role": "user", "content": [ { "type": "text", "text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?" } ] } ])print(response)# in case of streaming responses you'd have to parse the response_chunk.choices[0].delta.content_blocks array# response = client.chat.completions.create(# ...same config as above but with stream: true# )# for chunk in response:# if chunk.choices[0].delta:# content_blocks = chunk.choices[0].delta.get("content_blocks")# if content_blocks is not None:# for content_block in content_blocks:# print(content_block)
OpenAI Python
Copy
from openai import OpenAIfrom obiguard import OBIGUARD_GATEWAY_URL, createHeadersopenai = OpenAI( api_key='VERTEX_API_KEY', base_url=OBIGUARD_GATEWAY_URL, default_headers=createHeaders( provider="vertex-ai", obiguard_api_key="OBIGUARD_API_KEY", strict_open_ai_compliance=False ))response = openai.chat.completions.create( model="gemini-2.5-flash-preview-04-17", max_tokens=3000, thinking={ "type": "enabled", "budget_tokens": 2030 }, stream=True, messages=[ { "role": "user", "content": [ { "type": "text", "text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?" } ] } ])print(response)
cURL
Copy
curl "https://gateway.obiguard.ai/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "x-obiguard-api-key: $OBIGUARD_API_KEY" \ -H "x-obiguard-provider: vertex-ai" \ -H "x-obiguard-api-key: $VERTEX_API_KEY" \ -H "x-obiguard-strict-open-ai-compliance: false" \ -d '{ "model": "gemini-2.5-flash-preview-04-17", "max_tokens": 3000, "thinking": { "type": "enabled", "budget_tokens": 2030 }, "stream": true, "messages": [ { "role": "user", "content": [ { "type": "text", "text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?" } ] } ] }'
To disable thinking for gemini models like gemini-2.5-flash-preview-04-17, you are required to explicitly set budget_tokens to 0.