Hi there!
I'm exploring a use case involving daily sales reports in CSV format. The goal is to send these reports to an LLM agent for automated summary and insights generation.
Currently, I'm evaluating two approaches:
- Transforming the CSV data (sourced from Google Sheets) into a single JSON object for direct submission to the LLM.
- Ingesting the CSV into a vector store, enabling the LLM to retrieve and analyze the data from that repository.
My primary concern is that vector-based chunking might overlook overarching patterns or insights. Conversely, transmitting a large JSON could encounter token limits. (I previously attempted to have the LLM calculate total sales, but it produced incorrect figures).
My objective is to obtain accurate, high-level analysis from these daily reports with minimal manual intervention.
I'm keen to learn how others have tackled similar scenarios. Any suggestions or alternative methods would be greatly appreciated.
Hey,
Consider performing the calculations yourself and sending only the results to the LLM.
Calculate totals/averages within your preprocessing script.
Send the LLM a summary object containing the numbers along with a few sample rows.
The LLM can then write the narrative without performing any calculations.
This approach resolves accuracy issues and bypasses token limits.
Example:
{
"date": "2025-01-15",
"total_sales": 45230.50,
"vs_yesterday": "+12.3%",
"top_products": [...],
"key_metrics": {...},
"sample_data": [10-20 most relevant rows]
}
I lean towards the initial options because I prefer to perform calculations before interacting with the AI regarding my document. Specifically, I'd like to compute averages, summaries, and totals for sales data, and if feasible, include the top 5 products.
If you're utilizing an AI Agent within a callin.io workflow, I'd rather directly access the Google Sheet without conversion and have the AI process the data from there.