Skip to main content

LangChain

🎯 Purpose​

The LangChain node connects workflows to multiple data sources and prepares documents for AI analysis.
It can load content from external systems, process raw text, and split large documents into chunks optimized for AI models.

Think of it as a document preparation assistant that transforms unstructured data into AI-ready input.

πŸ“₯ Inputs​

  • Text Input (String, Optional): When using Input String, the node accepts text data from a previous node.
  • Data Loader Input (Config, Optional): When using Data Loaders, the node fetches content directly from external systems.

πŸ“€ Outputs​

The node generates two standard properties for downstream nodes:

  • page_content β†’ Extracted or chunked text ready for AI processing.
  • metadata β†’ Source information (file name, creation date, document type, etc.).

If the operation is set to "Split into chunks", multiple records are output (one per chunk).

βš™οΈ Parameters​

NameTypeRequiredDefaultDescription
Data SourceDropdownβœ… YesData LoadersSelect source of data: Data Loaders (external sources) or Input String (workflow text).
Input Property NameStringNoβ€”Property name of the text input from previous node (only used if Input String selected).
Data LoaderDropdownNoβ€”Choose a specific integration (e.g., Airtable, Confluence, CSV, etc).
OperationDropdownNoSplit into chunksProcessing mode: Single value (full document) or Split into chunks.
chunkSizeNumberNo1000Size of each chunk (1–10,000 characters). Active only if chunking enabled.
chunkOverlapNumberNo200Overlap between chunks (0–1,000 characters). Prevents loss of context.

πŸ“˜ Best Practices​

  • Chunking:
    • Q&A Systems: 500–800 size, overlap 150–200.
    • Summarization: 2000–3000 size, overlap 100.
    • Analysis: 1000–1500 size, overlap 300–400.
  • Performance: Process documents in batches. Keep chunk sizes aligned with your AI model’s token limits.
  • Output Handling: Always reference page_content and metadata in downstream nodes.

πŸ§ͺ Test Cases​

  • Given: Input string "Customer review: Product arrived late.", Operation = Split into chunks, chunkSize = 20.
    β†’ Expected: Output chunks of max 20 chars with overlap.
  • Given: DirectoryLoader pointing to 5 contracts.
    β†’ Expected: 5 documents processed into page_content chunks with metadata showing file names.