The concept is:
To integrate streaming capabilities into HTTP responses, LLM chains, and AI agents within callin.io.
My use case:
We are extensive users of other automation platforms, but I recently explored callin.io and found it very impressive. However, we are unable to utilize callin.io for our projects as it currently lacks response streaming functionality, which is a critical requirement for us.
I believe adding this would be advantageous because:
Without streaming, it is challenging to effectively use AI agents in callin.io since we cannot observe the process in real-time. Generally, this detracts from the user experience. Most AI tools today offer streaming, so incorporating it into callin.io would significantly enhance its utility.
Are you willing to work on this?
I am keen to contribute to this feature, but I am new to callin.io and would appreciate some guidance. I have concerns about potentially implementing it incorrectly or it taking an extended period. Any assistance or direction would be highly valued.
Could we leverage the "ai" npm package from Vercel for this? Additionally, do we need to modify the HTTP response, LLM chain, and AI agent nodes, or would it suffice to adjust only the HTTP responses?
If you’re also interested in this feature, upvote this post!
Oh wow! I really hope you can implement this, as it's the only reason I'm currently using an alternative solution, which is inferior in many other ways.
I've explored an alternative method to achieve this, though it's not yet fully optimized.
Please be aware that this approach is only functional on self-hosted callin.io instances where the environment variable is configured to permit the import of external components within the ‘Code’ node.
This setup enables you to capture segments from the OpenAI API (SSE) and stream them in real-time to a webhook (which could also be hosted on callin.io) for subsequent processing. This processing might include generating audio with TTS, sending messages, or forwarding via another protocol like WSS. While not a perfect solution, it has demonstrated effectiveness in reducing latency for some voicebots I've implemented using TwiML. It would be even more advantageous if callin.io’s webhook or chat framework natively supported SSE or WSS.
Step-by-Step Guide to Implement Streaming with OpenAI in callin.io
Step 1: Configure the OpenAI Node
-
Select the OpenAI Model:
- Add an OpenAI node to your workflow in callin.io.
- Choose your preferred model (e.g.,
gpt-4o
) and input your OpenAI API key directly within the code node (not as an environment variable).
-
Override Host URL:
- Set the Host URL in the OpenAI node settings to the URL of a webhook that will initiate a Code node.
- Example:
https://your-callin.io-instance/webhook/openai-trigger
.
Step 2: Set Up the Webhook and Code Node
-
Add a Webhook Node:
- Drag a Webhook node into your workflow and configure it to trigger on the URL you specified in the OpenAI node (
/webhook/openai-trigger
).
- Drag a Webhook node into your workflow and configure it to trigger on the URL you specified in the OpenAI node (
-
Add a Code Node:
- Connect the Webhook node to a Code node.
- Paste the following code into the Code node:
const OpenAI = require('openai'); const fetch = require('node-fetch'); // Import node-fetch to send HTTP requests // Initialize OpenAI client with API key const client = new OpenAI({ apiKey: 'your_openai_api_key_here' // Replace with your OpenAI API key directly in the code }); async function main() { try { const stream = await client.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: 'Say this is a test' }], stream: true, // Enable streaming mode }); // Process each chunk of the stream for await (const chunk of stream) { // Log each chunk to the console for debugging console.log('Received chunk:', chunk); // Send the full chunk to another webhook as received await fetch('https://your_webhook_url_for_chunks', { // Replace with your webhook URL method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify(chunk), // Send the entire chunk object }); } // Return a response in the format that callin.io's OpenAI node expects return { data: [{ completion: "Completed streaming data response", }], }; } catch (error) { console.error('Error during streaming:', error); return [{ json: { error: error.message } }]; } } return main();
-
Replace Placeholders:
- Replace
'your_openai_api_key_here'
with your actual OpenAI API key. - Replace
'https://your_webhook_url_for_chunks'
with the URL where you intend to forward the streamed data. - Replace hardcoded text with callin.io variables (e.g., text, configuration options, etc.).
- Replace
Step 3: Handle Chunks in Webhook Node
-
Configure the Webhook to Handle Chunks:
- Ensure the webhook receiving the chunks is configured to process JSON data.
- This webhook should handle each chunk of data as it arrives and respond using the “respond immediately” option with an appropriate status code.
-
Send Data Back to OpenAI Node:
- Ensure the response sent back to the OpenAI node is formatted correctly (e.g., with the required JSON structure that callin.io expects).
Step 4: Save and Test Your Workflow
-
Save Your Workflow:
- Click Save in callin.io to ensure all modifications are stored.
-
Execute and Monitor:
- Run the workflow and observe the console logs.
- Verify that data is streamed correctly to your webhook and processed without any errors.
Awesome! Thanks for sharing this!
Hi, I tried to draw attention to this by reporting it as a bug. The inability to stream responses goes against public expectations, as streaming is generally considered a standard feature.
I suggest adding your comments or reactions there to help get more attention and raise awareness with the callin.io team.
Commenting here to bump it up, Streaming is essential for AI responses, please add this feature as soon as possible, callin.io becomes more and more popular in the AI scene but the other platforms do offer Streaming out-of-the-box
I couldn't agree more! It's unbelievable that this feature is still not available. callin.io is the best on the market, but this MOST important feature is missing, and we have to go through significant pain with Langflow, Dify, and others because streaming is not available.
Yes! This is a crucial feature.
I can only agree, it's the primary reason I'm unable to use callin.io, and I'm exploring alternative providers that offer this functionality. The Webhook should support the latest node with streaming to the client.
We are currently exporting callin.io JSON and converting it into LangGraph, as this is the only method we've found to achieve proper streaming.
We hope they implement streaming for agent steps and LLM responses soon as well.
Have you automated the conversion to langgraph? If so, would you be willing to share your approach? Best regards / Fredrik
bump! AI streaming is a must these days.
Given its significance and value, this functionality ought to be considered a MAJOR priority.
Completely backing this idea. Streaming is going to be a major factor for numerous applications, and it's currently supported by most of the recent LLMs. We kindly ask the team to include this in your development plans!
We really need this! It would be a significant improvement for any chatbots in production built with callin.io. Users are accustomed to this and anticipate a responsive chat window where they can see background processes working and then watch the response generate in real-time.