LangChain Integration - Synthreo Builder
LangChain node for Builder - integrate LangChain agents, chains, and tools into your workflow to leverage advanced orchestration, memory management, and retrieval-augmented generation.
Purpose
Section titled “Purpose”The LangChain node connects workflows to multiple data sources and prepares documents for AI analysis. It can load content from external systems, process raw text, and split large documents into chunks optimized for AI models.
Think of it as a document preparation assistant that transforms unstructured data into AI-ready input. Before passing content to an LLM node, the LangChain node ensures that documents are broken into appropriately sized pieces with source metadata attached, which improves both the accuracy and traceability of AI-generated responses.
Inputs
Section titled “Inputs”- Text Input (String, Optional): When using
Input String, the node accepts text data from a previous node. - Data Loader Input (Config, Optional): When using
Data Loaders, the node fetches content directly from external systems configured in the node settings.
Outputs
Section titled “Outputs”The node generates two standard properties for downstream nodes:
page_content- Extracted or chunked text ready for AI processing.metadata- Source information such as file name, creation date, and document type.
If the operation is set to “Split into chunks”, multiple output records are produced - one per chunk. Each record carries its own page_content and metadata fields.
Parameters
Section titled “Parameters”| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| Data Source | Dropdown | Yes | Data Loaders | Select the source of data: Data Loaders (external integrations) or Input String (text from a previous workflow node). |
| Input Property Name | String | No | - | The property name from the previous node that holds the text input. Only used when Input String is selected. |
| Data Loader | Dropdown | No | - | Choose a specific integration to load documents from, such as Airtable, Confluence, or CSV. |
| Operation | Dropdown | No | Split into chunks | Processing mode: Single value returns the full document as one record; Split into chunks divides the document into multiple smaller records. |
| chunkSize | Number | No | 1000 | Maximum number of characters per chunk. Valid range: 1 to 10,000. Active only when chunking is enabled. |
| chunkOverlap | Number | No | 200 | Number of characters shared between consecutive chunks. Valid range: 0 to 1,000. Overlap helps preserve context across chunk boundaries. |
Data Source Options
Section titled “Data Source Options”Input String
Section titled “Input String”Use this option when the text to process is already available as a property on the current workflow data row - for example, the body of an email, a user message, or the output of a previous transformation node.
Set Input Property Name to the exact property key that contains the text.
Data Loaders
Section titled “Data Loaders”Use this option to pull documents directly from an external system. Available loader integrations include:
- Airtable
- Confluence
- CSV files
- Directory (local file system folder)
- And additional connectors depending on your deployment
When a Data Loader is selected, the node fetches content at runtime and passes it through the configured operation before producing output.
Operation Modes
Section titled “Operation Modes”Single Value
Section titled “Single Value”Returns the entire document as one output record. Use this when the document is short enough to fit within an LLM’s context window, or when you need to pass the full text to a summarization node without dividing it.
Split into Chunks
Section titled “Split into Chunks”Divides the document into multiple records based on chunkSize and chunkOverlap. Each chunk becomes a separate downstream data row. Use this for long documents that exceed token limits, or when building retrieval-augmented generation (RAG) pipelines where chunks are stored in a vector database.
Chunk Size Recommendations
Section titled “Chunk Size Recommendations”| Use Case | Recommended chunkSize | Recommended chunkOverlap |
|---|---|---|
| Q&A systems | 500 - 800 | 150 - 200 |
| Summarization | 2000 - 3000 | 100 |
| Document analysis | 1000 - 1500 | 300 - 400 |
| RAG / vector storage | 512 - 1024 | 128 - 256 |
Align chunk size with the token limit of your target LLM. A rough estimate is that 1,000 characters is approximately 250 tokens for English text.
Example Usage
Section titled “Example Usage”Scenario 1 - Processing a customer review for sentiment analysis:
- Set Data Source to
Input String. - Set Input Property Name to
review_text. - Set Operation to
Single value. - Pass the output
page_contentto an LLM node configured for sentiment classification.
Scenario 2 - Loading contracts from a directory for clause extraction:
- Set Data Source to
Data Loaders. - Select the Directory loader and point it to the folder containing contract files.
- Set Operation to
Split into chunkswith chunkSize 1000 and chunkOverlap 200. - Pass each chunk’s
page_contentandmetadatato a vector storage node for indexing.
Scenario 3 - Summarizing a Confluence page:
- Set Data Source to
Data Loaders. - Select the Confluence loader and configure the page ID.
- Set Operation to
Single valueif the page is short, orSplit into chunksfor long pages. - Feed the output directly into an LLM summarization prompt.
Troubleshooting
Section titled “Troubleshooting”| Issue | Likely Cause | Resolution |
|---|---|---|
Output contains empty page_content | Input property name does not match the upstream field | Verify the exact property name from the previous node’s output. |
| Too many chunks produced | chunkSize is set very low | Increase chunkSize to a value appropriate for your document length and model token limit. |
| Data Loader returns no records | External system connection is not configured or credentials are invalid | Check the loader settings and confirm the external integration is authorized. |
| Context lost between chunks | chunkOverlap is set to 0 | Set chunkOverlap to at least 100-200 characters to preserve sentence context at chunk boundaries. |
| Downstream LLM node exceeds token limit | chunkSize is too large | Reduce chunkSize so that each chunk fits within the model’s context window. |
Best Practices
Section titled “Best Practices”- Chunking for Q&A Systems: Use 500 to 800 characters per chunk with an overlap of 150 to 200 to ensure questions that span sentence boundaries are answered correctly.
- Chunking for Summarization: Use 2,000 to 3,000 characters per chunk with lower overlap (around 100) to give the model substantial context without repetition.
- Performance: Process documents in batches and keep chunk sizes aligned with your AI model’s token limits to avoid truncation errors.
- Output Handling: Always reference
page_contentandmetadatain downstream nodes. Themetadatafield is especially useful for attributing AI responses back to the original source document. - Single Value vs. Chunking: Use Single Value only when the full document is guaranteed to fit within the model’s context window. When in doubt, chunk the document and reassemble results if needed.
Test Cases
Section titled “Test Cases”- Given: Input string
"Customer review: Product arrived late.", Operation = Split into chunks, chunkSize = 20.- Expected: Output contains multiple records with
page_contentvalues of at most 20 characters, with overlap between consecutive chunks.
- Expected: Output contains multiple records with
- Given: Directory loader pointing to 5 contract files, Operation = Split into chunks.
- Expected: Multiple records produced with
page_contentcontaining contract text andmetadatashowing the originating file name for each chunk.
- Expected: Multiple records produced with
- Given: Input string with 500 characters, Operation = Single value.
- Expected: One output record with the full 500 characters in
page_content.
- Expected: One output record with the full 500 characters in
Related Nodes
Section titled “Related Nodes”- OpenAI GPT - passes
page_contentchunks to an LLM for analysis or generation. - HTTP Client - can fetch raw text from external URLs before passing it to the LangChain node.
- Custom Script - useful for pre-processing or cleaning document text before chunking.