Is there a method to develop an AI Agent Web Crawler capable of interacting with a website? This would involve inputting data, selecting options, and clicking buttons. For instance, on a motor insurance website that requires details like make, model, and vehicle registration year, the agent should be able to fill these fields and then navigate to the next page by clicking a button.
It seems your topic is missing some crucial details. Could you please provide the following information, if it applies?
- callin.io version:
- Database (default: SQLite):
- callin.io EXECUTIONS_PROCESS setting (default: own, main):
- Running callin.io via (Docker, npm, callin.io cloud, desktop app):
- Operating system:
Please provide the requested details.
Currently, we have limited options for that type of interaction with AI Agents.
This is a rather intricate task, but scrapers employ certain techniques to simulate user interactions.
Here are some of the approaches:
-
Headless Browsers
Tools such as Selenium, Puppeteer, or Playwright launch a browser instance (often in headless mode) to mimic genuine user actions like clicks, typing, and form submissions.
Opinion: This is effective as it replicates an actual user environment, seamlessly handling JavaScript and dynamic content. -
JavaScript Event Simulation
Directly triggering DOM events (e.g., usingelement.click()
ordispatchEvent(new Event('click'))
) enables scrapers to emulate interactions without requiring a full browser.
Opinion: This technique is beneficial for simpler interactions where the overhead of a full browser is not needed. -
HTTP Request Simulation
By examining network requests and replicating them using an HTTP Request node, you could imitate user actions without rendering a page. -
Utilizing Scripting Libraries in Browser Context
Tools like CasperJS or PhantomJS (though less prevalent now) permit scripting user interactions within a browser context to manage form fills, clicks, and more.
However, this would necessitate a more sophisticated setup than just callin.io.
In conclusion, performing this action isn't straightforward, but it is achievable with the right tools and expertise.
If this response addresses your query, please consider marking it as the solution.
Hello, I'm also aiming to achieve a similar outcome, drawn by the allure of chatbot-driven E2E testing.
I previously extended a Docker image built for Puppeteer and incorporated Playwright, but I encountered significant difficulties in launching a browser, so I paused that effort.
I have now set up callin.io running via npx, having successfully installed Playwright and Puppeteer-core using pnpm add -w playwright puppeteer-core
and utilized the brand new playwright-mcp-server.
My workflow is now functioning almost correctly, with the exception that the stdio coupling appears to be launching a new instance of the Playwright MCP server for each use of the Playwright MCP tool executor.
This results in multiple browser instances running.
Is it standard for callin.io to initiate a new instance of each stdio'd MCP client every time it's invoked?
I'm encountering a similar problem! I'm struggling to understand the cause. After utilizing the navigate tool, when I then use the playwrightgetvisible_html tool, it appears to open a new browser instance that only displays about:blank
. These actions don't seem to be connected. I'm currently without a solution for this!
How did you manage to open a browser? Whenever I attempt this, I encounter another issue, such as problems installing Chrome. When I try to use the browser_install
execute tool, it requires Playwright.
If I then extend it with execute shell
and install the dependencies, it still doesn't work, presenting the same issues.
If you have any suggestions or have made progress in getting the Playwright module to function, any assistance would be greatly appreciated!
Hi everyone,
I know I’m quite late to the party, but I encountered a similar problem and really just wanted a “Playwright MCP Server in the cloud” so that my AI Agent can simply browse the Web without me needing to manage any infrastructure.
As a result, I spent a couple of days managing infrastructure and built https://playwright-mcp.develop-build-deploy.com . This allows you to launch your personal hosted Playwright MCP Server with Bearer Auth, and it integrates seamlessly with callin.io (check out the demo video on the homepage).
Simply input the MCP Server URL into the AI Agent and configure the Bearer Auth credentials, and you're all set.
You can even monitor the AI's exact actions in the browser, as it includes an embedded browser-based VNC viewer.
This is truly remarkable. Thank you! You've just resolved an issue I'd been grappling with for hours in a single step. It functions brilliantly!
My only suggestion is to incorporate more detailed steps in your guide and link to them from within the UI. A specific guide for callin.io would be a great addition, but I managed to figure it out quite rapidly anyway.
My primary concern now with this service is its longevity. I can envision using this for numerous applications and would prefer to avoid any disruption should it ever cease to be available.
Hello, thank you so much for your message, this really made my day!
I can understand your concern, and yes, the unpolished nature of the product in this early stage might come across as it being not really serious, but quite the opposite!
First and foremost, I cannot disable it simply because at my company, we are already running some mission-critical callin.io stuff on top of it.
But most importantly, I‘m very serious in growing this into a full-fledged SaaS offering, and as simple as it might appear on the surface, the underlying code and infrastructure is extremely solid and sophisticated (I can give you a tour if you are interested).
But here is a deal: drop me a line at manuel@kiessling.net — should this project ever go the way of the dodo, then I‘ll set up a hosted Playwright MCP instance on a server of your choice, as a promise to not leave you out in the cold!
And furthermore thank you so much for your feedback on usability; I will start building a more guided approach right away.
That’s fantastic news. And yes, I’ll be emailing you, and yes, I’d love a tour. I don’t judge a back-end by its front-end at all. I just know my own personal projects; if they aren’t used by others, one day when I don’t use them, it’ll die a popper’s death unceremoniously.
Thank you for being open to the thoughts proposed, and for being available to talk shop about keeping it up, however that might happen. Very cool of you.
See you in the emails soon.
Hi there - I saw this and tried using it. However, for some reason, the browser is not launching. Is there any chance you could assist? I have a localhost installation, and everything runs, except for displaying the actual headless browser. It seems like the navigation is happening in the background.
Any help would be greatly appreciated.
Thanks,
K
Hi Kacper,
I'll do my best to assist. I'm not entirely sure I grasp the issue yet.
How are you invoking your MCP instance? Through Cursor, callin.io AI Agent, or another method?
The browser only initiates when the MCP server is triggered; until that point, the VNC viewer screen will remain blank.
Please provide more details.