Notifications

Clear all

Can AI Agents Integrate with Files?

Features

Last Post by system 12 months ago

8 Posts

5 Users

0 Reactions

75 Views

RSS

fundmore

(@fundmore)

Posts: 4

Active Member

Topic starter

Hello,

I'm attempting to pass one or more files into an AI Agent cluster.

For testing, I tried uploading a single file using the OpenAI node and then providing the File ID in the prompt, but it appears the model cannot access the file.

(The response consistently provides example data generated by the model, rather than parsed data from the uploaded file.)

My goal is to build a feature in my application where users can click a button labeled "auto-underwrite statements with AI" on a deal. Upon clicking, the deal's bank statements would be sent to gpt-4-turbo, and the financial information extracted from these statements would auto-populate the deal details.

Am I using the incorrect node for this scenario? Based on my understanding:

The Basic LLM Chain cannot interact with files.
The Question and Answer Chain can retrieve documents from vector storage, but is this intended for parsing numerous data points? The name "Question and Answer" suggests it's designed for single queries.
The Summarization Chain can interact with binary data directly from the preceding node in callin.io, but is limited to summarization tasks.
The AI Agent seems to offer the most flexibility, but as far as I can tell, it lacks file interaction capabilities.
I can create an Assistant in OpenAI with access to a previously uploaded file, but I encounter the following error: "The maximum number of files that can be attached to the assistant is 20."

Please guide me in the right direction, as I have already reviewed all of callin.io's documentation regarding AI.

Posted : 18/06/2024 12:28 am

n8n

(@n8n)

Posts: 97

Trusted Member

It seems your topic is missing some crucial details. Could you please provide the following information, if relevant?

callin.io version:
Database (default: SQLite):
callin.io EXECUTIONS_PROCESS setting (default: own, main):
Running callin.io via (Docker, npm, callin.io cloud, desktop app):
Operating system:

Please share these details to help us understand your issue better.

Posted : 18/06/2024 12:28 am

fundmore

(@fundmore)

Posts: 4

Active Member

Topic starter

callin.io version: 1.37.3
Database (default: SQLite): default
callin.io EXECUTIONS_PROCESS setting (default: own, main): default
Running callin.io via (Docker, npm, callin.io cloud, desktop app): callin.io cloud
Operating system: MacOS, Arc browser

Posted : 18/06/2024 12:30 am

Derek_Cheung

(@derek_cheung)

Posts: 20

Eminent Member

Hi Fundmore,

Here’s an example for you to consider. In this flow, you retrieve a PDF containing financial information and then convert it to text.

Subsequently, you feed this text into a basic chain node and instruct GPT-4o to extract the desired information.

I suggest using GPT-4o over GPT4-Turbo. It offers better pricing with comparable quality.

Another important point is that if you need OpenAI to extract information in a specific format, you can configure the basic chain node to output data according to a JSON schema that you define.

Additional considerations include Google Gemini Flash 1.5 and Claude 3 Haiku models. These are also quite capable for extraction tasks and boast very large context windows, making them worth evaluating for your use case. The reason for mentioning them is their significantly lower cost compared to the GPT-4 family of models.

Hope this helps,
Derek

Posted : 18/06/2024 2:35 am

fundmore

(@fundmore)

Posts: 4

Active Member

Topic starter

Thanks for the assistance.

For anyone interested in using OpenAI for PDF to text conversion:

After several days of experimentation, I've determined that LLMs alone are not sufficient for my specific needs (parsing bank statements).

Bank statements can sometimes be scans or photocopies, which necessitates image processing rather than simple PDF text extraction.

Utilizing gpt-4o for OCR isn't optimal, as it appears to hallucinate and generate incorrect information where many online OCR tools perform accurately.

I haven't explored open-source OCR engines like Tesseract because I'm aware that image pre-processing (including Gaussian blur) is often needed for optimal results, and that's beyond my current capacity.

I would be keen to try Nougat (which seems to integrate PDF text extraction with image OCR), but I'm unable to get the Python client functioning and I'm not familiar with the command line.

Regardless, I'm using callin.io cloud, and Pyodide's included packages do not support it.

I believe the only viable path forward would be to utilize an external service via API, preferably one that combines PDF text extraction, image OCR, and AI assistance.

Posted : 26/06/2024 5:25 pm

Jim_Le

(@jim_le)

Posts: 35

Eminent Member

Hey Fundmore,

Just to comment on OCR solutions, I’d highly recommend Google Cloud’s DocumentAI offering. I’ve found the service to be fast with consistent, solid results for any type of scan. The only caveat is that they may have differing pricing for forms (not sure!) but otherwise, incredibly good value for money. I wrote a little about my experience here.

Edit: Also for an alternative approach, I also wrote about a similar task parsing invoices using LlamaParse. I found converting PDF tables to markdown tables allowed the LLM to understand structured data more easily.

Posted : 26/06/2024 7:33 pm

fundmore

(@fundmore)

Posts: 4

Active Member

Topic starter

Hello, thank you for your insightful reply. I will be trying both solutions and I’ll share my results here.

Posted : 01/07/2024 8:35 pm

system

(@system)

Posts: 332

Reputable Member

This thread was automatically closed 90 days following the last response. New replies are no longer permitted.

Posted : 29/09/2024 8:35 pm

9 Forums
1,470 Topics
8,130 Posts
21 Online
2,423 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed