Web Search Node - Synthreo Builder
Web Search node for Builder - perform live internet searches from within an AI agent workflow to retrieve current information, news, and web data for LLM context enrichment.
Overview
Section titled “Overview”The Web Search node enables your workflow to automatically search the internet and extract information from web pages. It supports two distinct modes: performing live searches using the Bing Search Engine API, and scraping specific known web pages using XPath selectors. Both modes allow workflows to gather current, real-world data without manual research.
Use this node when your AI agent needs up-to-date information from the web, when you want to monitor specific websites for changes, or when you need to enrich workflow data with content from publicly available sources.
What This Node Does
Section titled “What This Node Does”The Web Search node acts as your workflow’s research layer, automatically:
- Searching the web using the Bing Search Engine API to find pages matching a query.
- Extracting specific data from known web pages using XPath selectors.
- Processing search results and formatting them for use in subsequent workflow steps.
- Handling multiple search queries and organizing results for downstream processing.
Parameters
Section titled “Parameters”Search Method Selection
Section titled “Search Method Selection”| Field Name | Type | Default | Description |
|---|---|---|---|
mode | Dropdown | Custom URL with XPath parser | The search and extraction method to use. |
| Mode | Description |
|---|---|
| Custom URL with XPath parser | Scrapes data from specific, known web pages using XPath selectors to extract precise content from defined page elements. |
| Bing Search Engine API | Performs live web searches using Microsoft’s Bing search engine and returns structured results including titles, descriptions, and URLs. |
Custom URL with XPath Parser
Section titled “Custom URL with XPath Parser”Use this mode when you know exactly which web pages to extract data from and need to pull specific elements rather than full page content.
When to Use
Section titled “When to Use”- Monitoring competitor product pages for price changes.
- Extracting news headlines from a specific publication’s website.
- Pulling contact details from company directory pages.
- Reading structured data from a site that does not provide an API.
Key Benefits
Section titled “Key Benefits”- Precise extraction from specific page elements using XPath expressions.
- No API costs or usage limits.
- Works with any publicly accessible web page.
- Highly targeted - pulls only the data you need.
XPath Configuration
Section titled “XPath Configuration”XPath (XML Path Language) is a query language for selecting elements within an HTML or XML document. To configure the XPath parser:
- Identify the page element containing the data you need (use browser developer tools to inspect the page).
- Write or generate an XPath expression that targets that element (for example,
//div[@class="price"]/text()to extract price text from a div with classprice). - Enter the target URL and XPath expression in the node configuration.
Common XPath examples:
//h1/text() - Extract the main page heading//div[@id="product-price"]/text() - Extract text from a specific ID//table//tr/td[2]/text() - Extract the second column of every table row//a[@class="article-link"]/@href - Extract href values from links with a given classBing Search Engine API
Section titled “Bing Search Engine API”Use this mode when you need to search the entire web rather than scrape known pages. This is the right choice for research, content discovery, brand monitoring, and competitive intelligence gathering.
When to Use
Section titled “When to Use”- Finding companies in specific industries for lead generation.
- Researching trending topics or keywords in a given domain.
- Monitoring brand mentions across the web.
- Discovering new competitors or market opportunities.
- Gathering broad intelligence on any subject where the sources are not known in advance.
Key Benefits
Section titled “Key Benefits”- Access to Bing’s search index covering billions of web pages.
- Structured results with titles, descriptions, and URLs for each result.
- Supports advanced search filtering and query customization.
- Enterprise-grade infrastructure for reliable search at scale.
API Configuration
Section titled “API Configuration”To use the Bing Search Engine API mode, you need a Bing Search API key from Microsoft Azure Cognitive Services. Configure the node with:
- Your Bing API key in the authentication field.
- The search query (supports dynamic variables from upstream nodes, for example
{{searchTerm}}). - Optional result count and filtering parameters.
Output
Section titled “Output”The node outputs the search results or extracted content as structured data for downstream nodes.
Output Format (Bing API example):
{ "web_search_result": [ { "title": "Result page title", "description": "Brief description or snippet from the page", "url": "https://example.com/page" } ]}Output Format (XPath example):
{ "web_search_result": "$24.99"}Real-World Use Cases
Section titled “Real-World Use Cases”Market Research Automation
Section titled “Market Research Automation”A marketing agency monitors competitor pricing across multiple e-commerce sites daily to adjust client pricing strategies.
Configuration:
- Mode: Custom URL with XPath parser
- Target URLs: competitor product pages
- XPath: expression targeting the price element on each page
Outcome: The workflow visits each product page, extracts the current price, and passes the data to a comparison and reporting node.
Lead Generation Through Web Research
Section titled “Lead Generation Through Web Research”A B2B sales team automatically finds and qualifies potential customers by searching for companies that use specific technologies.
Configuration:
- Mode: Bing Search Engine API
- Search Query:
{{targetTechnology}} vendor site:linkedin.com - Result Count: 10 per query
Outcome: The workflow returns a list of companies matching the search criteria, which are then passed to an LLM node for qualification scoring.
Content Research and Brand Monitoring
Section titled “Content Research and Brand Monitoring”A content marketing team tracks brand mentions and industry trend coverage across the web.
Configuration:
- Mode: Bing Search Engine API
- Search Query:
"{{brandName}}" -site:owneddomain.com - Scheduled to run daily
Outcome: New brand mentions and industry articles are collected and passed to an email or Slack notification node for team review.
Real Estate Market Data Collection
Section titled “Real Estate Market Data Collection”A real estate investment team pulls property listing data from multiple listing sites.
Configuration:
- Mode: Custom URL with XPath parser
- Target URLs: specific property listing pages
- XPath: expressions targeting price, square footage, and address elements
Outcome: Structured property data is extracted and stored in a database node for comparative analysis.
Step-by-Step Configuration
Section titled “Step-by-Step Configuration”Setting Up the Node
Section titled “Setting Up the Node”- Drag the Web Search node from the node panel onto your workflow canvas.
- Connect it to the node that provides search terms or URLs.
- Click the node to open the settings panel.
Configuring Custom URL with XPath Parser
Section titled “Configuring Custom URL with XPath Parser”- In the Type dropdown, select Custom URL with XPath parser.
- Enter the target URL in the URL field. Use
{{variableName}}to make URLs dynamic. - Enter your XPath expression to target the specific page element you want to extract.
- Test with a sample URL to verify the XPath returns the expected content.
- Set a descriptive Result Property Name and save.
Configuring Bing Search Engine API
Section titled “Configuring Bing Search Engine API”- In the Type dropdown, select Bing Search Engine API.
- Enter your Bing API key in the authentication field.
- Enter the search query in the query field. Use
{{searchTerm}}to pass dynamic queries from upstream nodes. - Set the desired number of results per query.
- Configure any additional filtering options shown in the settings panel.
- Test with a sample query, review the results, and save.
Integration with Other Nodes
Section titled “Integration with Other Nodes”Recommended Upstream Nodes
Section titled “Recommended Upstream Nodes”- Form inputs or triggers - providing search terms or target URLs.
- Database nodes - supplying lists of competitor URLs or monitored keywords.
- LLM nodes - generating search queries based on prior analysis.
Recommended Downstream Nodes
Section titled “Recommended Downstream Nodes”- LangChain - for chunking and preparing extracted content for AI analysis.
- OpenAI GPT - for summarizing or classifying search results.
- CRUD Integration - for storing findings in Monday.com, Airtable, or similar platforms.
- Send Email / Send SMS - for alerting teams to relevant search findings.
- HTTP Client - for making follow-up API calls to URLs found in search results.
Troubleshooting
Section titled “Troubleshooting”| Issue | Likely Cause | Resolution |
|---|---|---|
| XPath returns empty results | XPath expression is incorrect or the page structure has changed | Use browser developer tools to re-inspect the page and update the XPath expression. |
| Bing API returns no results | API key is invalid or the query has no matching results | Verify the API key in Azure portal and test the query directly in a Bing search to confirm results exist. |
| Page content changes between runs | The target site has updated its HTML structure | Re-inspect the page and update the XPath selector to match the new structure. |
| Rate limiting errors from Bing | Too many API calls in a short window | Reduce the frequency of workflow runs or add a delay between batched search queries. |
| Search results are irrelevant | Query is too broad | Refine the search query with more specific terms, site filters, or date restrictions. |
| Dynamic page content not extracted | Site renders content via JavaScript | For JavaScript-heavy pages, consider using the URL Scraper node with the Chrome Browser engine instead. |
Best Practices
Section titled “Best Practices”Search Method Selection
Section titled “Search Method Selection”- Use Custom URL with XPath parser when you have specific, known pages to monitor and need precise data from defined elements.
- Use Bing Search Engine API when you need broad web coverage and the relevant sources are not known in advance.
- The Custom URL method has no API costs or rate limits, making it more economical for high-volume monitoring of known pages.
Data Quality
Section titled “Data Quality”- Test XPath expressions against live pages before deploying to production. Page structures change, and expressions that work today may break when a site is redesigned.
- Validate search results with spot checks on a regular basis.
- Set up error handling (for example, using a conditional node) to detect when expected content is missing.
Performance Optimization
Section titled “Performance Optimization”- Limit the number of Bing API results to what is actually needed to reduce processing time and API costs.
- Use specific search queries with additional filters to reduce irrelevant results that require downstream filtering.
- Schedule search-intensive workflows during off-peak hours.
Compliance and Ethics
Section titled “Compliance and Ethics”- Respect the terms of service of websites you scrape with the XPath method.
- Only extract publicly available information.
- Implement reasonable delays between requests to avoid placing excessive load on target servers.
Related Nodes
Section titled “Related Nodes”- URL Scraper - for full page content extraction when XPath is too narrow or when PDF processing is needed.
- HTTP Client - for calling a structured API endpoint at a URL discovered through search results.
- LangChain - for preparing extracted web content for AI model consumption.
- OpenAI GPT - for analyzing, summarizing, or classifying content retrieved by this node.