Hello,
I’ve been using callin.io for just 2 months.
I want to utilize the new gemini-2.5-pro-preview-tts
model. It’s available in the Gemini Chat Model.
However, I can’t find out how to use it. I’ve searched the internet but haven’t found anything, only this Gemini documentation (Génération de synthèse vocale | Gemini API | Google AI for Developers).
I asked Gemini, but all the answers it provided didn’t work.
Could someone help me create this workflow?
I want to convert the text from the output of an Agent IA node into an audio file and send it to a Google Drive Folder.
Thanks
Hello! Welcome!
Currently, callin.io does not appear to directly support audio generation using the gemini-2.5-pro-preview-tts
model. While the model is listed within the Gemini Chat models in callin.io, you're unable to configure the responseModalities
parameter necessary for an audio response.
To utilize the
gemini-2.5-pro-preview-tts
model for speech synthesis, you'll need to perform a direct HTTP request to the Gemini API, ensuring the required parameters are correctly set according to the documentation you provided.
Set up an HTTP Request node in callin.io
- Method:
POST
- URL:
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-preview-tts:generateSpeech?key=YOUR_API_KEY
- Headers:
Content-Type: application/json
- Body (Raw JSON):
{
"text": "{{ $json.text }}",
"audioConfig": {
"speakingRate": 1.0,
"voice": {
"name": "en-US-Standard-B"
}
}
}
Please substitute
{{ $json.text }}
with the actual text output from your Agent node.
Decode and save the audio
The response will include a base64-encoded audio file. You can add a Function node with the following code:
return [{
binary: {
data: {
data: Buffer.from($json.audio.audioData, 'base64'),
mimeType: 'audio/mp3',
fileName: 'output.mp3'
}
}
}];
Upload to Google Drive
Let me know! Cheers
I managed to get it working by implementing the following steps:
(One important detail is that the default output audio format is a .pcm file. This means you'll need to convert it to either WAV or MP3 for usability. If you're self-hosting callin.io, you can achieve this by installing ffmpeg into your Docker container. However, if you're using a cloud-based setup, you might need to utilize an external API service for the conversion.)
I tried your workflow. I can use Google Gemini TTS. Does Google Gemini require using ffmpeg to convert to a WAV file?
Thank you very much!
Yes, I believe Gemini only provides audio files in .pcm format, requiring conversion to .wav or .mp3 for usage. If you're self-hosting callin.io, I found this method to be the most straightforward.
This is fantastic! Just a heads-up: your API key is visible in the HTTP request.