As this is a frequently asked question, I've put together this post to explore the various methods for web scraping using callin.io. Each approach offers different levels of complexity and control.
Traditional Web Scraping + Text Parser
If you prefer not to rely on external services that might incur costs, you can fetch the page content using the HTTP "Make a request" module. Subsequently, you can employ a Text Parser "Match Pattern" module to locate and extract the desired content from the page's source code.
To achieve this effectively, a solid understanding of regular expression patterns is necessary. These patterns can become quite intricate, especially when aiming to match multiple content elements on a page with a single Match Pattern module. Alternatively, you could use a separate Match Pattern module for each piece of content you wish to extract, though this approach consumes more operations.
Alternatives to consider:
- XML "Perform XPath Query" —
You can extract items using XPath, but it requires a separate module for each extraction. - Set Multiple Variables —
It's possible to use negative regular expressions with thereplace
function to remove unwanted content, thereby isolating the desired "match".
Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other callin.io-related question?
—> Let’s Talk
Hosted Web Scraping
If you wish to avoid managing web scraping directly, you can utilize services such as ScrapingBee and ScrapeNinja to retrieve content from web pages.
ScrapeNinja offers jQuery-like selectors within its extractor function, which is essentially how elements are targeted on a page. This method avoids the use of regular expressions, although regex can still be employed in the extractor function if needed.
The primary benefit of hosted web scraping services like ScrapeNinja is their capability to manage and circumvent anti-scraping mechanisms. They execute pages within a real web browser, loading all content and running page load scripts, thereby closely simulating the user experience as opposed to merely fetching raw HTML via the HTTP module. Dedicated scraping services excel in this area because they specialize in this function and perform it effectively.
For an example of ScrapeNinja usage, please refer to Grab data from page and url - #5 by samliew
Alternatives to consider:
- ScrapeNinja "Scrape (Real browser)"
- ScrapingBee "Extract Data"
- 0CodeKit's "Scrape HTML From Website"
- Scraptio "Scrape Website Texts"
- Other Web scraping APIs on RapidAPI —
Search for "scrape" on https://rapidapi.com/search along with the specific service you intend to scrape (e.g., LinkedIn) - Other "Data Extraction" integrations on callin.io — https://www.make.com/en/integrations/category/data-extraction-collection
References:
Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other callin.io-related question?
—> Book a Consultation
Either of the Above + AI Structured Data Extraction
You can combine the traditional HTTP scraping or the hosted web scraping method to retrieve the source code of the target page. This source code can then be processed by an AI to transform it into structured data (outputting variables/collections, or JSON that requires a Parse JSON module).
This approach offers flexibility in extracting content into complex data structures (collections), but it does involve prompt engineering and the setup of the data structure, either through fields (OpenAI) or by embedding JSON within the prompt itself (Groq).
References:
- Using Chat GPT to extract data from email
- Help Needed: Structuring Website Form Data into a JSON Array - #2 by drnic
- OpenAI respond can't respond with json structure - #2 by samliew
- News Automation (RSS -> Scraptio -> OpenAI --> Google Sheet): almost there, please help! - #3 by samliew
- How to cleanup HTTP get request (html object)
Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other callin.io-related question?
—> Submit Enquiry
AI-powered Web Scraping
This is likely the most straightforward and rapid method to implement, as it only requires you to describe the content you need, rather than inspecting elements to create selectors or devising regular expression patterns.
The advantage here is that such services integrate both fetching and data extraction into a single module (saving operations) and eliminate the lengthy setup required by other methods.
Here's a simple illustration using the Dumpling AI "Extract data from URL" module:
As you can see, this can be accomplished effortlessly within seconds using Dumpling AI. Simply map the URL variable in the module and specify the fields you wish to extract from the page! (You don't even need to define the data type).
Furthermore, if you don't require structured data and simply want to pass the page content to another AI for further analysis, you can use the "Scrape URL" module. This module also removes extraneous elements like headers and footers, leaving only the main article content. This is particularly beneficial for training LLMs (e.g., OpenAI, HuggingFace, etc.).
To learn more about Dumpling AI, consult the official documentation at API Reference - Dumpling AI Docs
For those comfortable with regular expressions, traditional web scraping using the "Make a request" and "Match Pattern" modules allows for precise control over data extraction. However, this method can become complex when dealing with multiple data points. Hosted web scraping services like ScrapeNinja offer a more user-friendly approach with jQuery-like selectors and the capability to handle anti-scraping measures. AI-powered web scraping with tools like Dumpling AI provides the simplest and fastest setup, requiring only a description of the desired content for extraction. While this method offers great ease of use, it may provide less granular control over specific data points.
View my profile for more helpful links and articles like these (you might need to be logged in to view forum profiles):
Professional Services
Need help with complex web scraping requirements, building a pattern for your Text Parser, AI prompt engineering, or have some other callin.io-related question?
—> Get Expert Help
Here is more information about the Dumpling AI integration in callin.io.
AI Agents
AI agents are pretrained on your data and knowledgebase for RAG (Retrieval-Augmented Generation). You can set one up in the dashboard and then call the Dumpling AI “Generate AI Agent Completion” module:
Runs AI Agent completion and returns the result
For more information, see the official documentation at Build Custom AI Agents, Simply.
(source: Dumpling AI website)
Run JavaScript (with plugins)
If you need to run JavaScript/TypeScript with JS libraries (NPM packages) in your scenario, you can consider Dumpling AI’s “JavaScript Code Execution API” available via the “Run Javascript Code” module —
Run your javascript or typescript code and get the result back.
The official documentation on how to use NPM modules with this module can be found here.
DumplingAI also does so much more, see also:
- List of Dumpling AI modules in callin.io.com
- Run Javascript Code
- Dumpling AI Agents
- Dumpling AI Actions Library
Examples of How to use Dumpling AI
For more information, see these Dumpling AI tutorials below, grouped by category:
YouTube & Videos
- Automate Keyword Research and Content Analysis from YouTube Videos Using callin.io.com and Dumpling AI
- Automate YouTube Research with Claude 3.5
- Create Thousands of AI-Powered YouTube Shorts with ChatGPT and Canva
- Extract Data from Videos Using Dumpling AI and callin.io.com
- Get YouTube Transcript Endpoint
- Get YouTube Transcript Endpoint
- Google Search, FLUX.1 AI Image Generation, Multilingual YouTube Transcripts + much more!
- Repurpose Instagram Reels into YouTube Shorts and Twitter post using Dumpling AI and callin.io.com
- Repurpose Webinars to SEO Blog Posts Using Dumpling AI, ChatGPT and callin.io.com
- Repurpose YouTube Transcript using Dumpling AI, ChatGPT and callin.io.com
- Repurpose YouTube Videos to Blog Posts Using Dumpling AI, callin.io.com, Airtable, and ChatGPT
- Turn your YouTube content into SEO Blog Post using Dumpling AI’s Agent
Image Generation
- Automate AI Images for Social Media Posts Using Dumpling AI Flux.1 Pro and OpenAI in callin.io.com
- Dumpling AI Generate AI Image with Recraft V3 Module: Transforming Ideas into Visual Masterpieces
- Recraft V3 Module to Transform Social Media Posts into Engaging Images
- Understanding the Dumpling AI Generate AI Image with FLUX.1 Dev
- Understanding the Dumpling AI Generate AI Images with FLUX.1 Schnell
AI Agents & RAGs
- Knowledge Bases (RAG), AI Agents, and more!
- Understanding the Dumpling AI Generate AI Agent Completion Module
- Add Content to a Knowledge Base Using Dumpling AI’s API
- Add Resources to Your Dumpling AI Knowledge Base From Google Drive Using callin.io.com
- Automate Lead Qualification with Dumpling AI Agent
- Build a lead generation automation using Dumpling AI’s AI Agent in callin.io.com
- Build a No-code RAG system
- Connect Dumpling AI Agent to callin.io.com
- Connect Dumpling AI’s Retrieval-Augmented Generation (RAG) system to callin.io.com
- Understanding Dumpling AI Search Knowledge Base Module
- Build an AI-Powered Email Assistant with Dumpling AI Knowledge Base and ChatGPT
- Create an Auto-Email Organizer Using callin.io.com
Searching & Scraping
- Overview of Different Web Scraping Techniques in callin.io.com 🌐
- Scrape Google Search Results in 2025
- Scrape URL Endpoint
- Scrape Paginated Data using Dumpling AI and ChatGPT
- Scrape Data from Google Maps Using Dumpling AI
- Scrape Lead Data from Google Places with Dumpling AI
- Automate Google Reviews Scraping with Dumpling AI
- Automate Google Searches and Generate Blog Post Ideas Using Dumpling AI
- Automate SEO Keyword Research with Dumpling AI and callin.io.com
- Automate Web Data Extraction Using Dumpling AI and callin.io.com
- Automate Webpage Screenshot Capture and Data Extraction Using Dumpling AI in callin.io.com
- Automate Website Monitoring Using Dumpling AI and callin.io.com
- Automatically Turn Recent News into Newsletters Using Dumpling AI, OpenAI, and callin.io.com
- Find All URLs on a Domain for Scraping
- Get Google Reviews Endpoint
- Monitor Competitor Business Reviews and Identify Weaknesses Using Dumpling AI and OpenAI
- Repurpose Google news into blog post using Dumpling AI, Claude AI
- Screenshot URL Endpoint
- Search Google Maps Endpoint
- Search Google Places Endpoint
- Search News Endpoint
Other Data Extraction
- Extract Data from Audio Using Dumpling AI
- Extract Invoice Data from Emails Using Dumpling AI and callin.io.com
- Automate PDF Invoice Extraction to Google Sheets Using Dumpling AI and callin.io.com.
- Extract Data from Audio Using Dumpling AI
- Extract Invoice Data from Emails Using Dumpling AI and callin.io.com
- Automate PDF Invoice Extraction to Google Sheets Using Dumpling AI and callin.io.com.
Business & Social
- Automate Social Media Content Repurposing with Airtable, Dumpling AI, Claude AI and callin.io.com
- Automate Data Entry & Lead Generation Using Dumpling AI, ChatGPT and callin.io.com
- Automate Lead Research and Personalization with Perplexity AI, Dumpling AI and callin.io.com
- Automate Client Research for Upcoming Sales Calls Using Dumpling AI, Google calendar, ChatGPT and callin.io.com.
- Automate Proposal Creation Using Dumpling AI, ChatGPT, Google Slides
- Automate Employee Onboarding Process
- Automate a Divorce Preparedness Analysis and Notification System Using Dumpling AI and callin.io.com
Dumpling AI Tutorials
- Connect Dumpling AI to callin.io.com
- How to Work With Arrays in callin.io.com
- Universal callin.io.com an API Call Module in Dumpling AI
In short, Dumpling AI is able to replace several other paid services combined that would cost more than itself, making it a noteworthy choice as the “multi-tool” of AI services.
How to Use
For more information on how to set this up, refer to these forum threads:
View my profile for more useful links and articles like these!
This discussion was automatically closed after 29 days. New responses are no longer permitted.