Notifications

Clear all

Using Prompt Caching in callin.io

Features

Last Post by ktan99 2 months ago

10 Posts

7 Users

0 Reactions

146 Views

RSS

Kiremit

(@kiremit)

Posts: 9

Active Member

Topic starter

Many models support prompt caching, which is highly advantageous for lengthy system prompts, repeated references to the same video, or any other token-intensive operations.
Is it feasible to implement a prompt caching system within callin.io?

Information on your callin.io setup

callin.io version: 1.81.4
Database (default: SQLite): SQLite
callin.io EXECUTIONS_PROCESS setting (default: own, main): Own
Running callin.io via (Docker, npm, callin.io cloud, desktop app): Google Cloud
Operating system: WIndows10

Posted : 07/03/2025 7:35 am

solomon

(@solomon)

Posts: 78

Trusted Member

When utilizing OpenAI, prompt caching occurs automatically.

Within callin.io, when you set up a prompt, it's transmitted with every execution. There isn't an integrated way to "cache" it independently to conserve tokens.

This is fundamental to how LLM APIs operate: each request requires the complete context (system, user, and assistant messages) to produce a response.

However, OpenAI already manages token savings internally when you submit similar requests repeatedly.

You can find more details in this documentation:

If this response addresses your question, please consider marking it as a solution.

Posted : 07/03/2025 8:07 pm

Kiremit

(@kiremit)

Posts: 9

Active Member

Topic starter

I reached out to OpenAI's developers, and they informed me differently.

Posted : 16/03/2025 12:23 pm

solomon

(@solomon)

Posts: 78

Trusted Member

What information did they provide? That the official documentation is out of date and they have discontinued prompt caching?

Posted : 16/03/2025 2:20 pm

EuRoBosch

(@eurobosch)

Posts: 1

New Member

What if we're utilizing Google Gemini? We'll require a new configuration option within the Gemini model node to specify the _cache_name. The request structure appears as follows:

{
 "contents": [
 {
 "parts":[
 {
 "text": "Please summarize this transcript"
 }
 ],
 "role": "user"
 },
 ],
 "cachedContent": "'$CACHE_NAME'"
}

Posted : 02/04/2025 12:57 pm

vkarbovnichy

(@vkarbovnichy)

Posts: 3

New Member

I'm looking into this as well.

Here's what the OpenRouter documentation states regarding caching:

Gemini isn't mentioned in the documentation. However,

Today, I experimented with the same chat pipeline using Sonnet 3.7 and Gemini 2.5 Pro through OpenRouter. For comparable requests in the middle of a chat (7k input, 500 output tokens), the cost with Sonnet was 43 times higher than with Gemini.

Based on the model costs: $1 per 1k input / $10 per 100k output for Gemini, and $3 per 1k input / $15 per 100k output for Sonnet, the difference should be around 3x, not 43x.

I also observed requests for Gemini where the costs were approximately 10x lower than Sonnet (at the beginning of a dialogue).

Therefore, it's highly likely that automatic caching is being applied for Gemini when using the OpenRouter Model node.

My system prompt alone is 4k tokens.

Posted : 07/04/2025 8:18 pm

vkarbovnichy

(@vkarbovnichy)

Posts: 3

New Member

I recently discovered that OpenRouter was utilizing my Google AI Studio account key. Consequently, the $300 in trial credits I had there were depleted, with the primary cost being charged to my Google AI Studio account.

After experimenting with API keys and their fallback mechanisms within OpenRouter, I now observe the costs directly in OpenRouter, as it primarily uses Google Vertex.

Posted : 23/04/2025 12:55 pm

ldaniel-jmz

(@ldaniel-jmz)

Posts: 1

New Member

Do the chat model nodes already include prompt caching?

Posted : 14/06/2025 8:13 pm

AlexJohns

(@alexjohns)

Posts: 3

New Member

The question seems to be about how to enable explicit context caching to ensure cost savings:

Posted : 14/06/2025 8:27 pm

ktan99

(@ktan99)

Posts: 2

New Member

I completely agree. I'm hoping that explicit caching will be enabled for the Gemini model mode.

Posted : 17/07/2025 2:25 pm

9 Forums
1,470 Topics
8,130 Posts
9 Online
2,423 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed