Skip to content
synthreo.ai

LangChain Integration - Synthreo Builder

LangChain node for Builder - integrate LangChain agents, chains, and tools into your workflow to leverage advanced orchestration, memory management, and retrieval-augmented generation.

The LangChain node connects workflows to multiple data sources and prepares documents for AI analysis. It can load content from external systems, process raw text, and split large documents into chunks optimized for AI models.

Think of it as a document preparation assistant that transforms unstructured data into AI-ready input. Before passing content to an LLM node, the LangChain node ensures that documents are broken into appropriately sized pieces with source metadata attached, which improves both the accuracy and traceability of AI-generated responses.


  • Text Input (String, Optional): When using Input String, the node accepts text data from a previous node.
  • Data Loader Input (Config, Optional): When using Data Loaders, the node fetches content directly from external systems configured in the node settings.

The node generates two standard properties for downstream nodes:

  • page_content - Extracted or chunked text ready for AI processing.
  • metadata - Source information such as file name, creation date, and document type.

If the operation is set to “Split into chunks”, multiple output records are produced - one per chunk. Each record carries its own page_content and metadata fields.


NameTypeRequiredDefaultDescription
Data SourceDropdownYesData LoadersSelect the source of data: Data Loaders (external integrations) or Input String (text from a previous workflow node).
Input Property NameStringNo-The property name from the previous node that holds the text input. Only used when Input String is selected.
Data LoaderDropdownNo-Choose a specific integration to load documents from, such as Airtable, Confluence, or CSV.
OperationDropdownNoSplit into chunksProcessing mode: Single value returns the full document as one record; Split into chunks divides the document into multiple smaller records.
chunkSizeNumberNo1000Maximum number of characters per chunk. Valid range: 1 to 10,000. Active only when chunking is enabled.
chunkOverlapNumberNo200Number of characters shared between consecutive chunks. Valid range: 0 to 1,000. Overlap helps preserve context across chunk boundaries.

Use this option when the text to process is already available as a property on the current workflow data row - for example, the body of an email, a user message, or the output of a previous transformation node.

Set Input Property Name to the exact property key that contains the text.

Use this option to pull documents directly from an external system. Available loader integrations include:

  • Airtable
  • Confluence
  • CSV files
  • Directory (local file system folder)
  • And additional connectors depending on your deployment

When a Data Loader is selected, the node fetches content at runtime and passes it through the configured operation before producing output.


Returns the entire document as one output record. Use this when the document is short enough to fit within an LLM’s context window, or when you need to pass the full text to a summarization node without dividing it.

Divides the document into multiple records based on chunkSize and chunkOverlap. Each chunk becomes a separate downstream data row. Use this for long documents that exceed token limits, or when building retrieval-augmented generation (RAG) pipelines where chunks are stored in a vector database.


Use CaseRecommended chunkSizeRecommended chunkOverlap
Q&A systems500 - 800150 - 200
Summarization2000 - 3000100
Document analysis1000 - 1500300 - 400
RAG / vector storage512 - 1024128 - 256

Align chunk size with the token limit of your target LLM. A rough estimate is that 1,000 characters is approximately 250 tokens for English text.


Scenario 1 - Processing a customer review for sentiment analysis:

  • Set Data Source to Input String.
  • Set Input Property Name to review_text.
  • Set Operation to Single value.
  • Pass the output page_content to an LLM node configured for sentiment classification.

Scenario 2 - Loading contracts from a directory for clause extraction:

  • Set Data Source to Data Loaders.
  • Select the Directory loader and point it to the folder containing contract files.
  • Set Operation to Split into chunks with chunkSize 1000 and chunkOverlap 200.
  • Pass each chunk’s page_content and metadata to a vector storage node for indexing.

Scenario 3 - Summarizing a Confluence page:

  • Set Data Source to Data Loaders.
  • Select the Confluence loader and configure the page ID.
  • Set Operation to Single value if the page is short, or Split into chunks for long pages.
  • Feed the output directly into an LLM summarization prompt.

IssueLikely CauseResolution
Output contains empty page_contentInput property name does not match the upstream fieldVerify the exact property name from the previous node’s output.
Too many chunks producedchunkSize is set very lowIncrease chunkSize to a value appropriate for your document length and model token limit.
Data Loader returns no recordsExternal system connection is not configured or credentials are invalidCheck the loader settings and confirm the external integration is authorized.
Context lost between chunkschunkOverlap is set to 0Set chunkOverlap to at least 100-200 characters to preserve sentence context at chunk boundaries.
Downstream LLM node exceeds token limitchunkSize is too largeReduce chunkSize so that each chunk fits within the model’s context window.

  • Chunking for Q&A Systems: Use 500 to 800 characters per chunk with an overlap of 150 to 200 to ensure questions that span sentence boundaries are answered correctly.
  • Chunking for Summarization: Use 2,000 to 3,000 characters per chunk with lower overlap (around 100) to give the model substantial context without repetition.
  • Performance: Process documents in batches and keep chunk sizes aligned with your AI model’s token limits to avoid truncation errors.
  • Output Handling: Always reference page_content and metadata in downstream nodes. The metadata field is especially useful for attributing AI responses back to the original source document.
  • Single Value vs. Chunking: Use Single Value only when the full document is guaranteed to fit within the model’s context window. When in doubt, chunk the document and reassemble results if needed.

  • Given: Input string "Customer review: Product arrived late.", Operation = Split into chunks, chunkSize = 20.
    • Expected: Output contains multiple records with page_content values of at most 20 characters, with overlap between consecutive chunks.
  • Given: Directory loader pointing to 5 contract files, Operation = Split into chunks.
    • Expected: Multiple records produced with page_content containing contract text and metadata showing the originating file name for each chunk.
  • Given: Input string with 500 characters, Operation = Single value.
    • Expected: One output record with the full 500 characters in page_content.

  • OpenAI GPT - passes page_content chunks to an LLM for analysis or generation.
  • HTTP Client - can fetch raw text from external URLs before passing it to the LangChain node.
  • Custom Script - useful for pre-processing or cleaning document text before chunking.