OCR

Legal Documents: Extract contract text for clause analysis.
Invoices: Pull text from scanned invoices for accounting automation.
Research Reports: Break down lengthy reports into chunks for AI analysis.

Purpose

The OCR node converts files (PDFs, images, Word docs, etc.) into readable text using text extraction and OCR (Optical Character Recognition).

This enables workflows to analyze, store, and process content from documents without manual transcription.

Output Format (example):

    {
      "text_result": "Extracted document content here..."
    }

Name	Type	Required	Default	Description
File Path (`getterTemplate`)	Smart text	Optional	Empty	Path or pattern to locate a single file. Supports dynamic values.
Folder Path (`filesFolderPath`)	Smart text	Optional	Empty	Base folder path for multiple files.
Remove File After Processing (`removeFileAfterProcessing`)	Toggle	Optional	Off	Deletes original file after extraction. Use with caution.
Limit OCR (`limitOcr`)	Toggle	Optional	Off	Restricts OCR usage to reduce costs. When off, OCR runs for scanned/handwritten documents.
Produce Chunks From PDF (`produceChunksFromPdf`)	Toggle	Optional	Off	Splits large PDFs into smaller text chunks.
Output Mode (`outTransformId`)	Dropdown	✅	Original + appended result	Choose between appending extracted text or returning only the text.
Result Property Name (`outColumnName`)	Text	✅	`text_result`	Name of the property holding extracted text.

Given: File = contract.pdf, OCR enabled → Expected: { "text_result": "Contract terms and conditions..." }
Given: Folder = /invoices/, Limit OCR on → Expected: Extracted text only from digital PDFs, no OCR processing.
Given: PDF > 50 pages, Produce Chunks enabled → Expected: Multiple text chunks { "chunk_1": "...", "chunk_2": "..." }.