Skip to content
Node: Gemini multim...
 
Notifications
Clear all

Node: Gemini multimodal GenAI (Vertex AI) Integration

6 Posts
4 Users
0 Reactions
4 Views
jb8n
 jb8n
(@jb8n)
Posts: 2
New Member
Topic starter
 

It would be beneficial to have a node for:

Google Gemini multimodal (Vertex AI)

My use case:

In short: More cost-effective, seemingly quicker, and potentially superior to callin.io's AnalyzeImage node.

callin.io already features a node for the OpenAI GPT4 Vision API, named "OpenAI - Analyze Image". This was recently introduced, possibly in response to a request made in Please add support of the new OpenAI features [done] - #26 by tomtom

I conducted a few comparisons between OpenAI and Google for the same multimodal use case, involving an image and a prompt. Gemini performed quite well. It appears to be faster (comparing the Google console with the callin.io node, which isn't a perfectly fair comparison) and yielded better creative results (based on my impressions, also not a definitive comparison).
The most significant difference lies in pricing: for an image and prompt combination, Gemini is 4 times cheaper (based on an image of approximately 600x600). Google's pricing is a flat rate per image, whereas callin.io's pricing scales with image dimensions.

Therefore, I believe a Google-based node could be more popular than the callin.io-based node. The user interface and parameters for the Google node (prompt + image URL) could mirror those of the callin.io node.

Any resources to support this?

Vertex AI offers a sandbox within the Google Cloud console.
API documentation is available at Google Cloud console.
Pricing details can be found at Pricing  |  Generative AI on Vertex AI  |  Google Cloud.

I understand that Vertex AI is the designation for the GenAI multimodal API. PaLM exclusively handles text inputs and outputs. The model I was able to test within Vertex AI is named "gemini-1.0-pro-vision-001".

I disclaim any responsibility should Google decide to rename their models and products in a confusing manner at any point.

:slight_smile:

Are you willing to work on this?

I can create a fork of my workflow and assist in testing the requested node against the currently available OpenAIanalyzeImage node.

 
Posted : 24/03/2024 5:52 pm
red1
 red1
(@red1)
Posts: 1
New Member
 

Hello, has anyone explored this yet?

 
Posted : 11/04/2024 10:10 am
jb8n
 jb8n
(@jb8n)
Posts: 2
New Member
Topic starter
 

I haven't received a response following that request. I assume it requires a certain number of upvotes to be considered?

I discovered that callin.io must be aware of it, as there's a landing page optimized for Gemini and Vertex AI keywords, but apparently nothing substantial behind it: Google Vertex AI integrations | Workflow automation with callin.io

 
Posted : 11/04/2024 10:58 am
biax
 biax
(@biax)
Posts: 1
New Member
 

This would be an amazing feature. We currently can't utilize any multi-modal functionalities within an AI Agent. It would be great to be able to pass an audio file directly to Gemini, for example, without needing to use Whisper first.

 
Posted : 13/03/2025 8:02 pm
Paul_Vincent
(@paul_vincent)
Posts: 7
Active Member
 

I’d second this; and would have thought the existing Gemini node could be tweaked to allow non-image binaries to be passed through, as the same functionality (submitting audio/video) can be achieved with the http node, albeit less elegantly!

There’s a wealth of potential in the Gemini multimodal abilities; it can analyse music for example, providing info on structure and influences etc., as well as just transcribe words, but I currently have to string code and http nodes together to achieve this.

 
Posted : 23/03/2025 11:09 am
Paul_Vincent
(@paul_vincent)
Posts: 7
Active Member
 

To add, if it assists anyone willing and able to develop this, the Multimodal capabilities can be achieved via the HTTP node, as demonstrated below (utilizing a form submission prompt and the Gemini API instead of Vertex, but the core principles remain the same). Integrating these capabilities directly into the Gemini/Vertex nodes would be a significant advancement. I'm not aware of other models that can process various file types as effectively as Gemini 2+.

 
Posted : 15/04/2025 10:51 am
Share: