LangChain Integration - Synthreo Builder

LangChain node for Builder - integrate LangChain agents, chains, and tools into your workflow to leverage advanced orchestration, memory management, and retrieval-augmented generation.

Purpose

The LangChain node connects workflows to multiple data sources and prepares documents for AI analysis. It can load content from external systems, process raw text, and split large documents into chunks optimized for AI models.

Think of it as a document preparation assistant that transforms unstructured data into AI-ready input. Before passing content to an LLM node, the LangChain node ensures that documents are broken into appropriately sized pieces with source metadata attached, which improves both the accuracy and traceability of AI-generated responses.

Inputs

Text Input (String, Optional): When using Input String, the node accepts text data from a previous node.
Data Loader Input (Config, Optional): When using Data Loaders, the node fetches content directly from external systems configured in the node settings.

Outputs

The node generates two standard properties for downstream nodes:

page_content - Extracted or chunked text ready for AI processing.
metadata - Source information such as file name, creation date, and document type.

If the operation is set to “Split into chunks”, multiple output records are produced - one per chunk. Each record carries its own page_content and metadata fields.

Parameters

Name	Type	Required	Default	Description
Data Source	Dropdown	Yes	`Data Loaders`	Select the source of data: `Data Loaders` (external integrations) or `Input String` (text from a previous workflow node).
Input Property Name	String	No	-	The property name from the previous node that holds the text input. Only used when `Input String` is selected.
Data Loader	Dropdown	No	-	Choose a specific integration to load documents from, such as Airtable, Confluence, or CSV.
Operation	Dropdown	No	`Split into chunks`	Processing mode: `Single value` returns the full document as one record; `Split into chunks` divides the document into multiple smaller records.
chunkSize	Number	No	1000	Maximum number of characters per chunk. Valid range: 1 to 10,000. Active only when chunking is enabled.
chunkOverlap	Number	No	200	Number of characters shared between consecutive chunks. Valid range: 0 to 1,000. Overlap helps preserve context across chunk boundaries.

Data Source Options

Input String

Use this option when the text to process is already available as a property on the current workflow data row - for example, the body of an email, a user message, or the output of a previous transformation node.

Set Input Property Name to the exact property key that contains the text.

Data Loaders

Use this option to pull documents directly from an external system. Available loader integrations include:

Airtable
Confluence
CSV files
Directory (local file system folder)
And additional connectors depending on your deployment

When a Data Loader is selected, the node fetches content at runtime and passes it through the configured operation before producing output.

Operation Modes

Single Value

Returns the entire document as one output record. Use this when the document is short enough to fit within an LLM’s context window, or when you need to pass the full text to a summarization node without dividing it.

Split into Chunks

Divides the document into multiple records based on chunkSize and chunkOverlap. Each chunk becomes a separate downstream data row. Use this for long documents that exceed token limits, or when building retrieval-augmented generation (RAG) pipelines where chunks are stored in a vector database.

Chunk Size Recommendations

Use Case	Recommended chunkSize	Recommended chunkOverlap
Q&A systems	500 - 800	150 - 200
Summarization	2000 - 3000	100
Document analysis	1000 - 1500	300 - 400
RAG / vector storage	512 - 1024	128 - 256

Align chunk size with the token limit of your target LLM. A rough estimate is that 1,000 characters is approximately 250 tokens for English text.

Example Usage

Scenario 1 - Processing a customer review for sentiment analysis:

Set Data Source to Input String.
Set Input Property Name to review_text.
Set Operation to Single value.
Pass the output page_content to an LLM node configured for sentiment classification.

Scenario 2 - Loading contracts from a directory for clause extraction:

Set Data Source to Data Loaders.
Select the Directory loader and point it to the folder containing contract files.
Set Operation to Split into chunks with chunkSize 1000 and chunkOverlap 200.
Pass each chunk’s page_content and metadata to a vector storage node for indexing.

Scenario 3 - Summarizing a Confluence page:

Set Data Source to Data Loaders.
Select the Confluence loader and configure the page ID.
Set Operation to Single value if the page is short, or Split into chunks for long pages.
Feed the output directly into an LLM summarization prompt.

Troubleshooting

Issue	Likely Cause	Resolution
Output contains empty `page_content`	Input property name does not match the upstream field	Verify the exact property name from the previous node’s output.
Too many chunks produced	chunkSize is set very low	Increase chunkSize to a value appropriate for your document length and model token limit.
Data Loader returns no records	External system connection is not configured or credentials are invalid	Check the loader settings and confirm the external integration is authorized.
Context lost between chunks	chunkOverlap is set to 0	Set chunkOverlap to at least 100-200 characters to preserve sentence context at chunk boundaries.
Downstream LLM node exceeds token limit	chunkSize is too large	Reduce chunkSize so that each chunk fits within the model’s context window.

Best Practices

Chunking for Q&A Systems: Use 500 to 800 characters per chunk with an overlap of 150 to 200 to ensure questions that span sentence boundaries are answered correctly.
Chunking for Summarization: Use 2,000 to 3,000 characters per chunk with lower overlap (around 100) to give the model substantial context without repetition.
Performance: Process documents in batches and keep chunk sizes aligned with your AI model’s token limits to avoid truncation errors.
Output Handling: Always reference page_content and metadata in downstream nodes. The metadata field is especially useful for attributing AI responses back to the original source document.
Single Value vs. Chunking: Use Single Value only when the full document is guaranteed to fit within the model’s context window. When in doubt, chunk the document and reassemble results if needed.

Test Cases

Given: Input string "Customer review: Product arrived late.", Operation = Split into chunks, chunkSize = 20.
- Expected: Output contains multiple records with page_content values of at most 20 characters, with overlap between consecutive chunks.
Given: Directory loader pointing to 5 contract files, Operation = Split into chunks.
- Expected: Multiple records produced with page_content containing contract text and metadata showing the originating file name for each chunk.
Given: Input string with 500 characters, Operation = Single value.
- Expected: One output record with the full 500 characters in page_content.

OpenAI GPT - passes page_content chunks to an LLM for analysis or generation.
HTTP Client - can fetch raw text from external URLs before passing it to the LangChain node.
Custom Script - useful for pre-processing or cleaning document text before chunking.

LangChain Integration - Synthreo Builder

Purpose

Inputs

Outputs

Parameters

Data Source Options

Input String

Data Loaders

Operation Modes

Single Value

Split into Chunks

Chunk Size Recommendations

Example Usage

Troubleshooting

Best Practices

Test Cases

ThreoAI

Wingtip

Builder

Pylon

Canopy

MSP Onboarding

Videos

Certification

LangChain Integration - Synthreo Builder

Purpose

Inputs

Outputs

Parameters

Data Source Options

Input String

Data Loaders

Operation Modes

Single Value

Split into Chunks

Chunk Size Recommendations

Example Usage

Troubleshooting

Best Practices

Test Cases

Related Nodes

ThreoAI

Wingtip

Builder

Pylon

Canopy

MSP Onboarding

Videos

Certification