Troubleshooting: Locally hosted LLM unable to call tools

fwasmeier

(@fwasmeier)

Posts: 3

Active Member

Topic starter

Hi,

My company and I are very pleased to evaluate callin.io to make an informed decision about whether this will be our go-to tool for the Agentic AI era. We are currently trying to run our local llama 3.3 super 49b with the callin.io AI Agent and have encountered an issue:

Describe the problem/error/question

I have an on-prem hosted NVD Nim container llama3.3 super:

I set everything up, connected it to an AI Agent, and attempted to chat. It worked perfectly.

However, the next step is to enable the AI Agent to use Tools. To do this, I changed the type to “Tools Agent” and added a simple Webex Tool to send a message to a Room.

THE PROBLEM: The LLM correctly identifies the tool and its usage, but the tool is never actually called. Instead, the tool call is outputted via chat:

This is also evident in the executed nodes, which show that the Webex Tool is not being utilized:

Please share your workflow

Share the output returned by the last node

The AI Agent then outputs:

json [ { “output”: “[{"name": "Create_a_message_in_Webex_by_Cisco", "arguments": {"Text": "Hi Test"}}]” } ]

Expected output

We would expect the local LLM to be able to utilize Tools from the callin.io AI Tools Agent and provide output to the user once tool usage is complete.

Just for your information, this exact workflow functions correctly with GPT Models (API usage).

I look forward to hearing your suggestions on how to resolve this.

Best,
Flo

Information on your callin.io setup

callin.io version: 1.100.0
Database (default: SQLite): default
callin.io EXECUTIONS_PROCESS setting (default: own, main): default
Running callin.io via (Docker, npm, callin.io cloud, desktop app): docker-compose self-hosted
Operating system:

Posted : 25/06/2025 6:19 pm

Jon

(@jon)

Posts: 82

Trusted Member

Hello,

One potential reason for this issue might be that many local LLMs aren't sophisticated enough to recognize when they need to invoke a tool and, consequently, do not support tool usage.

Have you verified if the model you're currently using supports tool use?

Posted : 27/06/2025 8:44 am

Wouter_Nigrini

(@wouter_nigrini)

Posts: 31

Eminent Member

Please see the link below for Llama models that support tool calls.

https://www.perplexity.ai/search/which-llama-models-support-too-AdeJi20YR1GXDzwxIMOwwg

Llama Models That Support Tool Calls

Several Llama models support tool calling (also known as function calling), enabling them to interact with external APIs, functions, or services. Here's a breakdown of which Llama models offer this capability and the types of tool calling they support:

Official Meta Llama Models

Llama 3.1
- Supports JSON-based tool calling natively.
- Widely implemented in platforms like Ollama and Groq, facilitating agentic automation and integration with external tools or APIs1 2 3 4.
- Available in various parameter sizes (e.g., 8B, 70B, 405B).
Llama 3.2
- Builds upon 3.1 with continued support for JSON-based tool calling.
- Introduces “pythonic” tool calling, a more flexible and Python-friendly format1.
Llama 4
- Supports both JSON-based and the new pythonic tool calling format.
- It's recommended to use the pythonic tool parser for optimal results.
- Supports parallel tool calls, a feature not present in Llama 3.x1.

Community and Fine-Tuned Models

Fine-tuned Llama 3 Models
- Community projects have fine-tuned Llama 3 (e.g., Llama3-8b-instruct) for enhanced function/tool calling, including LoRA adapters and quantized versions for efficient local deployment5 6.
- These fine-tuned models are trained on datasets specifically designed for function calling tasks and are available in different formats (16-bit, 4-bit, GGUF for llama.cpp, etc.).
TinyLlama
- A smaller, fine-tuned variant with tool/function calling support, suitable for resource-constrained environments6.

Comparison Table

Model	Tool Calling Support	Format(s) Supported	Notable Features
Llama 3.1	Yes	JSON-based	Native support, broad adoption
Llama 3.2	Yes	JSON, Pythonic	Adds pythonic tool calling
Llama 4	Yes	JSON, Pythonic	Parallel tool calls supported
Llama3-8b-instruct*	Yes (fine-tuned)	JSON-based	Community fine-tune, local use
TinyLlama*	Yes (fine-tuned)	JSON-based	Small, efficient, fine-tuned

*Community fine-tuned models, not official Meta releases.

Key Points

Llama 3.1, 3.2, and 4 all support tool calling, with enhanced capabilities and flexibility in newer versions1 2 3.
JSON-based tool calling is the standard across all models, while pythonic tool calling is introduced in 3.2 and recommended for Llama 41.
Parallel tool calls are exclusively supported in Llama 41.
Fine-tuned models such as those from the “unclecode” repository extend tool calling capabilities to smaller or more specialized Llama variants5 6.

In summary, if you require tool calling support, opt for Llama 3.1 or a later version. For advanced features like pythonic tool calling and parallel execution, Llama 4 is the recommended choice. Fine-tuned community models are also available for specific use cases or lightweight deployments.

Posted : 27/06/2025 8:48 am

fwasmeier

(@fwasmeier)

Posts: 3

Active Member

Topic starter

Hi,

Thank you for your prompt response.

If you review the model card from NVD, it explicitly states that the model is trained for tool calling.

I also verified its capability by switching the Agent Type to “OpenAI Functions Agent”. With this agent type, I'm encountering a 400 error (no body) from callin.io.

This is the console log when using the Model in the “OpenAI Functions Agent”:

2025-06-27T11:08:45.467Z | error | 400 status code (no body) {"file":"error-reporter.js","function":"defaultReport"}
2025-06-27T11:08:45.467Z | debug | Running node "AI Agent" finished with error {"node":"AI Agent","workflowId":"jOqu92akylxZQm06","file":"logger-proxy.js","function":"exports.debug"}
2025-06-27T11:08:45.467Z | debug | Executing hook on node "AI Agent" (hookFunctionsPush) {"executionId":"6809","pushRef":"wvad9rmsml","workflowId":"jOqu92akylxZQm06","file":"execution-lifecycle-hooks.js"}
2025-06-27T11:08:45.468Z | debug | Pushed to frontend: nodeExecuteAfter {"dataType":"nodeExecuteAfter","pushRefs":"wvad9rmsml","file":"abstract.push.js","function":"sendTo"}
2025-06-27T11:08:45.468Z | debug | Workflow execution finished with error {"error":{"level":"warning","tags":{},"context":{},"functionality":"configuration-node","name":"NodeApiError","timestamp":1751022525464,"node":{"parameters":{"notice":"","model":{"__rl":true,"value":"nvidia/llama-3.3-nemotron-super-49b-v1","mode":"list","cachedResultName":"nvidia/llama-3.3-nemotron-super-49b-v1"},"options":{}},"type":"@n8n/n8n-nodes-langchain.lmChatOpenAi","typeVersion":1.2,"position":[-840,-320],"id":"cec34fcd-ddfd-4bcb-b4bd-b97031e8ee17","name":"Local","notesInFlow":true,"credentials":{"openAiApi":{"id":"dX1EaCNOPnvtPwDG","name":"Local Reasoning Model"}}},"messages":["400 status code (no body)"],"httpCode":"400","description":"400 status code (no body)","message":"Bad request - please check your parameters","stack":"NodeApiError: Bad request - please check your parametersn    at Object.onFailedAttempt (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/@n8n+n8n-nodes-langchain@file+packages+@n8n+nodes-langchain_9ca6f82764a6c40719e9f8a538948cbd/node_modules/@n8n/n8n-nodes-langchain/nodes/llms/n8nLlmFailedAttemptHandler.ts:26:21)n    at RetryOperation._fn (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/p-retry@4.6.2/node_modules/p-retry/index.js:67:20)n    at processTicksAndRejections (node:internal/process/task_queues:105:5)"}},"workflowId":"jOqu92akylxZQm06","file":"logger-proxy.js","function":"exports.debug"}

!!! IMPORTANT
The part that is particularly interesting is that when I use the Plan & Execute Agent with the model and include a Wikipedia tool, it successfully utilizes the tool and returns the result to the user.

The tests above suggest that the model is theoretically capable of calling tools, but there might be an issue elsewhere that I'm unable to identify.

Questions that have arisen from this:

Is there a difference in how tools are invoked between the Plan & Execute Agent, OpenAI Functions Agent, and Tools Agent?
Which Agent would be the most appropriate for this scenario (given that the llama3.3 model uses the OpenAI API standard)?

If I can provide any further information or debugging logs, please let me know, and I'll be happy to assist.

Posted : 27/06/2025 1:54 pm

fwasmeier

(@fwasmeier)

Posts: 3

Active Member

Topic starter

Thank you for providing information. From the model card, it appears viable for tool calling. Additionally, testing with the Plan and Execute Agent type resulted in successful usage of the Wikipedia tool. Issues arise with the Tools Agent or OpenAI Functions Agent.

Posted : 27/06/2025 1:57 pm