Notifications

Clear all

Node: Gemini multimodal GenAI (Vertex AI) Integration

Features

Last Post by Paul_Vincent 5 months ago

6 Posts

4 Users

0 Reactions

114 Views

RSS

jb8n

(@jb8n)

Posts: 2

New Member

Topic starter

It would be beneficial to have a node for:

Google Gemini multimodal (Vertex AI)

My use case:

In short: More cost-effective, seemingly quicker, and potentially superior to callin.io's AnalyzeImage node.

callin.io already features a node for the OpenAI GPT4 Vision API, named "OpenAI - Analyze Image". This was recently introduced, possibly in response to a request made in Please add support of the new OpenAI features [done] - #26 by tomtom

I conducted a few comparisons between OpenAI and Google for the same multimodal use case, involving an image and a prompt. Gemini performed quite well. It appears to be faster (comparing the Google console with the callin.io node, which isn't a perfectly fair comparison) and yielded better creative results (based on my impressions, also not a definitive comparison).
The most significant difference lies in pricing: for an image and prompt combination, Gemini is 4 times cheaper (based on an image of approximately 600x600). Google's pricing is a flat rate per image, whereas callin.io's pricing scales with image dimensions.

Therefore, I believe a Google-based node could be more popular than the callin.io-based node. The user interface and parameters for the Google node (prompt + image URL) could mirror those of the callin.io node.

Any resources to support this?

Vertex AI offers a sandbox within the Google Cloud console.
API documentation is available at Google Cloud console.
Pricing details can be found at Pricing | Generative AI on Vertex AI | Google Cloud.

I understand that Vertex AI is the designation for the GenAI multimodal API. PaLM exclusively handles text inputs and outputs. The model I was able to test within Vertex AI is named "gemini-1.0-pro-vision-001".

I disclaim any responsibility should Google decide to rename their models and products in a confusing manner at any point.

Are you willing to work on this?

I can create a fork of my workflow and assist in testing the requested node against the currently available OpenAIanalyzeImage node.

Posted : 24/03/2024 5:52 pm

red1

(@red1)

Posts: 1

New Member

Hello, has anyone explored this yet?

Posted : 11/04/2024 10:10 am

jb8n

(@jb8n)

Posts: 2

New Member

Topic starter

I haven't received a response following that request. I assume it requires a certain number of upvotes to be considered?

I discovered that callin.io must be aware of it, as there's a landing page optimized for Gemini and Vertex AI keywords, but apparently nothing substantial behind it: Google Vertex AI integrations | Workflow automation with callin.io

Posted : 11/04/2024 10:58 am

biax

(@biax)

Posts: 1

New Member

This would be an amazing feature. We currently can't utilize any multi-modal functionalities within an AI Agent. It would be great to be able to pass an audio file directly to Gemini, for example, without needing to use Whisper first.

Posted : 13/03/2025 8:02 pm

Paul_Vincent

(@paul_vincent)

Posts: 7

Active Member

I’d second this; and would have thought the existing Gemini node could be tweaked to allow non-image binaries to be passed through, as the same functionality (submitting audio/video) can be achieved with the http node, albeit less elegantly!

There’s a wealth of potential in the Gemini multimodal abilities; it can analyse music for example, providing info on structure and influences etc., as well as just transcribe words, but I currently have to string code and http nodes together to achieve this.

Posted : 23/03/2025 11:09 am

Paul_Vincent

(@paul_vincent)

Posts: 7

Active Member

To add, if it assists anyone willing and able to develop this, the Multimodal capabilities can be achieved via the HTTP node, as demonstrated below (utilizing a form submission prompt and the Gemini API instead of Vertex, but the core principles remain the same). Integrating these capabilities directly into the Gemini/Vertex nodes would be a significant advancement. I'm not aware of other models that can process various file types as effectively as Gemini 2+.

Posted : 15/04/2025 10:51 am

9 Forums
1,470 Topics
8,130 Posts
8 Online
2,423 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed