Skip to content
Using Prompt Cachin...
 
Notifications
Clear all

Using Prompt Caching in callin.io

10 Posts
7 Users
0 Reactions
3 Views
Kiremit
(@kiremit)
Posts: 9
Active Member
Topic starter
 

Many models support prompt caching, which is highly advantageous for lengthy system prompts, repeated references to the same video, or any other token-intensive operations.
Is it feasible to implement a prompt caching system within callin.io?

Information on your callin.io setup

  • callin.io version: 1.81.4
  • Database (default: SQLite): SQLite
  • callin.io EXECUTIONS_PROCESS setting (default: own, main): Own
  • Running callin.io via (Docker, npm, callin.io cloud, desktop app): Google Cloud
  • Operating system: WIndows10
 
Posted : 07/03/2025 7:35 am
solomon
(@solomon)
Posts: 78
Trusted Member
 

When utilizing OpenAI, prompt caching occurs automatically.

Within callin.io, when you set up a prompt, it's transmitted with every execution. There isn't an integrated way to "cache" it independently to conserve tokens.

This is fundamental to how LLM APIs operate: each request requires the complete context (system, user, and assistant messages) to produce a response.

However, OpenAI already manages token savings internally when you submit similar requests repeatedly.

You can find more details in this documentation:

:point_right:

If this response addresses your question, please consider marking it as a solution.

 
Posted : 07/03/2025 8:07 pm
Kiremit
(@kiremit)
Posts: 9
Active Member
Topic starter
 

I reached out to OpenAI's developers, and they informed me differently.

 
Posted : 16/03/2025 12:23 pm
solomon
(@solomon)
Posts: 78
Trusted Member
 

What information did they provide? That the official documentation is out of date and they have discontinued prompt caching?

 
Posted : 16/03/2025 2:20 pm
EuRoBosch
(@eurobosch)
Posts: 1
New Member
 

What if we're utilizing Google Gemini? We'll require a new configuration option within the Gemini model node to specify the _cache_name. The request structure appears as follows:

{
 "contents": [
 {
 "parts":[
 {
 "text": "Please summarize this transcript"
 }
 ],
 "role": "user"
 },
 ],
 "cachedContent": "'$CACHE_NAME'"
}
 
Posted : 02/04/2025 12:57 pm
vkarbovnichy
(@vkarbovnichy)
Posts: 3
New Member
 

I'm looking into this as well.

Here's what the OpenRouter documentation states regarding caching:

Gemini isn't mentioned in the documentation. However,

Today, I experimented with the same chat pipeline using Sonnet 3.7 and Gemini 2.5 Pro through OpenRouter. For comparable requests in the middle of a chat (7k input, 500 output tokens), the cost with Sonnet was 43 times higher than with Gemini.

Based on the model costs: $1 per 1k input / $10 per 100k output for Gemini, and $3 per 1k input / $15 per 100k output for Sonnet, the difference should be around 3x, not 43x.

I also observed requests for Gemini where the costs were approximately 10x lower than Sonnet (at the beginning of a dialogue).

Therefore, it's highly likely that automatic caching is being applied for Gemini when using the OpenRouter Model node.

My system prompt alone is 4k tokens.

 
Posted : 07/04/2025 8:18 pm
vkarbovnichy
(@vkarbovnichy)
Posts: 3
New Member
 

I recently discovered that OpenRouter was utilizing my Google AI Studio account key. Consequently, the $300 in trial credits I had there were depleted, with the primary cost being charged to my Google AI Studio account.

After experimenting with API keys and their fallback mechanisms within OpenRouter, I now observe the costs directly in OpenRouter, as it primarily uses Google Vertex.

 
Posted : 23/04/2025 12:55 pm
ldaniel-jmz
(@ldaniel-jmz)
Posts: 1
New Member
 

Do the chat model nodes already include prompt caching?

 
Posted : 14/06/2025 8:13 pm
AlexJohns
(@alexjohns)
Posts: 3
New Member
 

The question seems to be about how to enable explicit context caching to ensure cost savings:

 
Posted : 14/06/2025 8:27 pm
ktan99
(@ktan99)
Posts: 2
New Member
 

I completely agree. I'm hoping that explicit caching will be enabled for the Gemini model mode.

 
Posted : 17/07/2025 2:25 pm
Share: