Describe the problem/error/question
I am generating videos automatically at scale. I currently have 70 videos, each around 6-8MB. My workflow functioned perfectly when processing a single video. However, callin.io times out when I attempt to run the node for all 70 videos. I'm trying to understand the service's size and memory limitations and how to handle large file sizes. This process also worked when uploading 5 videos at a time, but I'm unsure about potential issues with large files and would appreciate any assistance or insights on how to prevent this timeout.
What is the error message (if any)?
Timeout
Please share your workflow
Share the output returned by the last node
Information on your callin.io setup
- callin.io version: 1.41
- Database (default: SQLite):
- callin.io EXECUTIONS_PROCESS setting (default: own, main):
- Running callin.io via (Docker, npm, callin.io cloud, desktop app): Docker
- Operating system:
I am auto-generating videos at scale. I have 70 videos right now that are about 6-8mb each, and I made my workflow work perfectly when I did this for one video. The problem is that callin.io times out when I try to run the node for 70 videos
Hi, I am very sorry you’re having trouble.
To avoid hitting any resource limits when processing a large number of files, I suggest you split your workflow into two separate workflows: One “parent” (fetching your Sheet with the individual URLs) and one “child” (doing the heavy lifting of first downloading a file, then uploading it to your Google Cloud Storage).
You can then use the Split In Batches node in your parent workflow and split your data into small batches of maybe 5 URLs at a time. Your parent would then call the child workflow through the Execute Workflow node.
The advantage of this approach is that all resources required by the child workflow execution would become available again after each child workflow execution finishes, provided you only return a very small (or possibly empty) result to the parent. So instead of having to keep 70 videos in memory at once, your callin.io instance only needs to keep 5 videos in memory at once.
Here’s how this could look like:
Parent workflow
Child workflow
On a slightly related note, you probably want to make sure to set the N8N_DEFAULT_BINARY_DATA_MODE=filesystem
environment variable to avoid using your memory for keeping large amounts binary data
Thank you so much!! I will try this out in my workflow. Appreciate the help for a n00b
Another question: I've been utilizing the batching functionality extensively, so thank you for showing me that!
I'm curious about best practices regarding the number of items to include in a batch. Currently, creating a batch of 1, running my script, writing to Google Sheets, and looping through this process has been effective, allowing me to see my output immediately. However, I'm unsure if there's an unknown factor making this approach highly inefficient. While I'm not charged per API call for any of my services, I'd appreciate insights into the advantages and disadvantages of using larger versus smaller batches, and any potential risks to be aware of.
Hi, the short answer is “it depends” (but you probably knew this already
). Here are my thoughts on this:
Creating a batch of 1, running my script, writing to Google Sheets and just looping that has been working well since I can see my output immediately, but I’m not sure if there’s something that I don’t know which makes this super inefficient.
So specifically with regards to Google this is slightly less efficient than processing multiple items at once, but not by much. Using batches means you’ll call some nodes more than once, and each additional node execution comes at a small cost depending on your specific setup (typically very small fractions of a second in terms of computing time).
However, the Google Sheets API isn’t very performant anyway, so the usual waiting time when calling this API will outweigh the aforementioned overhead by far.
I’m not paying per API call on any of my services, but just wanted to know what the pros/cons are of larger or smaller batches, and if there are any dangers to be aware of, if you could speak to that.
pro:
- less data processed with each individual sub-workflow execution which increases overall stability
- better visibility (you can see partial progress when manually executing your workflow)
cons
- slightly slower (this might matter when you work with very fast databases on your local network, but probably not when using external services such as Google Sheets or Airtable for example)
- more API calls are being used (again, whether this matters will depend on the exact services you use)
Most often you probably want to pick a batch size larger than 1 but smaller than the total amount of items to get the best of both worlds.
Hope this makes sense!
Yes, thank you so much!! I believe you're correct that with GSheets, the benefits of reliability (and simpler debugging when an issue arises, at least you'll know which row caused it) might outweigh the potential for it to be slower, as that's less critical for me in this specific scenario. Thanks for helping me consider this!
This discussion was automatically closed 90 days following the last response. New responses are no longer permitted.