FileToText

🎯 Purpose

The FileToText node converts documents (PDFs, images, office files) into machine-readable text using OCR (Optical Character Recognition) and parsing techniques. This allows workflows to analyze, search, and process unstructured documents at scale.

📥 Inputs

File Path / Folder Path (String, Optional): Path to the file(s) to be processed. Can be static or dynamically supplied from previous nodes.

📤 Outputs

Extracted Text: The text content extracted from the file(s).
Structured Output: Depending on configuration, may include metadata and original file information.

⚙️ Parameters

Name	Type	Required	Default	Description
File Path	String	No	(empty)	Specific file path to process. Supports dynamic input.
Folder Path	String	No	(empty)	Base folder path for batch file processing.
Remove File After Processing	Boolean (Toggle)	No	Off	Deletes the original file after extraction.
Limit OCR	Boolean (Toggle)	No	Off	Limits OCR depth for faster but less detailed processing.
Option	Dropdown	No	Original with appended result column	Defines output format: keep original data + extracted text, or return text only.
Result Property Name	String	No	text_result	Name of the property that stores extracted text.

📂 Supported File Types

The node supports a wide range of document and image formats:

Documents: PDF, DOCX, TXT, RTF
Spreadsheets: XLSX, CSV (basic parsing into text)
Images: PNG, JPG/JPEG, TIFF, BMP
Scanned Files: Multi-page PDFs and image-based PDFs (via OCR)

⚠️ Note: OCR quality may vary depending on scan resolution, file quality, and language.

💡 Example Usage

Invoice Automation

Setup:
- Folder Path = /uploads/invoices/
- Keep Remove File After Processing = Off
- Enable Limit OCR
Result: Invoices are extracted to invoice_text for automated data entry.

Customer Support Documents

Setup:
- Dynamic File Path from email attachments
- produceChunksFromPdf = Off
- Option = Return result column only
Result: Attachments are extracted into plain text for categorization and AI triage.

📘 Best Practices

Organize files in clear folder structures for easier batch processing.
Enable Limit OCR for high-volume, simple documents.
Use chunking for large or complex PDFs that need contextual sectioning.
Always test with sample files before scaling up.
Be cautious with Remove File After Processing in compliance-heavy industries.

🧪 Test Cases

Given: invoice.jpg with Limit OCR = On →
Expected: Extracted text with faster but less detailed results.
Given: Folder with 3 PDFs, Option = Return result column only →
Expected: Output array of plain text results, one per file.

🎯 Purpose​

📥 Inputs​

📤 Outputs​

⚙️ Parameters​

📂 Supported File Types​

💡 Example Usage​

Invoice Automation​

Customer Support Documents​

📘 Best Practices​

🧪 Test Cases​