Hi,
I’m developing an agent to handle Telegram messages, find an available slot in one Google Calendar, and schedule a meeting in another Google Calendar.
My setup is using two Google Calendar tools: “Find Available Slot” and “Book a Meeting”.
There are many constraints to booking the call. I put all of that in the user message of the AI agent.
However, when it runs, it says that there are no available slots, but I know there are available slots because I checked the calendar and there are timeslots where bookings can be scheduled.
I suspect that the reasoning model, which is O4 mini, is not doing the work as I expected.
I can’t find a way to include the reasoning model’s output in order to debug and refine my prompt. Is there a way for me to do that?
Thanks,
L
Information on your callin.io setup
- callin.io version: 1.94.1
- Running callin.io via (Docker, npm, callin.io cloud, desktop app): Docker on Google Cloud
- Operating system: Debian something
Hello,
Yes, do you have the output or workflow of the AI agent? You can navigate to 'Execution' and 'Copy to Editor' if it was run via the production URL. Alternatively, if running in the editor, simply open the AI agent node, and on the right-hand side:
If you connect a chat trigger to it, you should be able to prompt similarly and observe its processing. On the right-hand side, you can see all the calls the AI agent makes. They are also developing this
Request for feedback: Workflow evaluation beta - #32 by Memoire
which might be helpful.
Best regards,
Samuel
Thanks for the reply.
I get the part about debugging and checking the flow output. However, what's missing is the model's reasoning process.
In the OpenAI API, there's a parameter named summary
which lets you see how the AI arrived at its answer. That's precisely what I'm looking for in the output.
Unfortunately, summary
isn't a parameter available in the current OpenAI models integration. I was curious if there's a way to include this parameter to view the output when the LLM is invoked.
Thanks,
L
It seems there might be an issue with my current setup. Is there a guide on best practices that I can refer to?
My prompt is designed to handle three main tasks:
- Extract the sender's name, email address, and any notes from a Telegram message.
- Subsequently, the agent should check the calendar to identify a suitable time slot.
- Finally, it should use the extracted email address, name, and notes to create a meeting in Google Calendar.
What I'm observing is that two calls are being made to the OpenAI API. The initial call doesn't return any parsed data (name, email, and notes are missing), and the subsequent call indicates no available time slots.
I tested the exact same prompt with the same model in ChatGPT, and it produced the expected outcomes, although it generated placeholder dates for the meeting since it cannot access my calendar.
This suggests I'm misunderstanding how the agent, the LLM, and Google Calendar interact.
Perhaps I'm attempting to accomplish too much in a single step. It might be necessary to break this down into multiple operations or separate workflows. However, this seems to contradict the core benefit of an agent making decisions autonomously using the tools it has access to.
L
If an AI agent is encountering issues with a task, it's often due to the system prompt and user prompt. Consider breaking down the steps using three separate agents and being more precise with your prompts. This approach should yield more reliable results. Have you tried this?
There's information available in the documentation. I'd suggest exploring multi-agent setups within your flow; you can still connect tools. I can attempt to create a quick example for you. Are you currently using a specific example?
If you're comfortable sharing, I can make a minor adjustment and observe its performance.
Perhaps something like this? (You can utilize an AI agent or the OpenAI LLM node.)
Best regards,
Samuel
Thanks for the reply.
I ended up providing more details in my prompt, and now I’m making more progress. It’s now booking, but not quite correctly. It’s utilizing the available slots I searched for, rather than finding a smaller slot to schedule a 30-minute meeting.
At least I can now see that the agent is correctly calling the tools. I believe the change I made in the logic was beneficial. I reviewed the documentation for Google Calendar and realized I had been misinterpreting the availability aspect.
I switched from retrieving all bookings instead of just availability and instructed the LLM to analyze them to identify a suitable time slot. It’s not perfect yet, but it’s progressing.
I appreciate the offer to handle it for me, but that’s not my objective. My aim is to learn how to make these systems function, and it’s testing my understanding of what AI agents do.
I’m uncertain if I’m misunderstanding AI agents and giving them too much credit, or if I don’t fully grasp how callin.io specifically implements them.
However, there has been progress, which is positive!
Thanks,
L
You're welcome, and yes, it depends on the model used. I've also noticed sometimes they seem to play dumb. I'm not sure if it's due to many people using the model at that time, leading to fewer resources, or if it needs a restart, lol. But yes, when tasks become more complex, they can struggle. I hope everything goes well, and it's a good learning curve. I'm enjoying it, honestly. Have a nice day.
Samuel