FeatureExtraction

Feature Extraction Node

The Feature Extraction node automatically identifies and extracts specific types of information (features) from text data, such as names, dates, locations, organizations, and monetary amounts. This powerful AI-driven node helps businesses structure unstructured text data for analysis, reporting, and automation.

What This Node Does

The Feature Extraction node analyzes text content and identifies meaningful entities like person names, company names, dates, locations, and other important information. It uses advanced natural language processing to understand context and extract relevant data points that would otherwise require manual review.

Business Value: Automatically processes large volumes of text documents, emails, or customer feedback to extract key information, saving hours of manual data entry and ensuring consistent, accurate results.

Configuration Parameters

Source Property Section

Column Name

Field Name: sourcePropName
Type: Smart text field with data suggestions
Default Value: Empty
Simple Description: The name of the column containing the text you want to analyze for feature extraction
When to Change This: Select the specific column from your data that contains the text content you want to process
Business Impact: Choosing the correct source column ensures the node analyzes the right text content and produces accurate results

Options Section

Language

Field Name: languageModel
Type: Dropdown menu with options:
- English: Optimized for English text analysis with highest accuracy
- German: Specialized for German language text processing
- French: Configured for French language content analysis
- Spanish: Tailored for Spanish text feature extraction
- Undetermined: Attempts to auto-detect language (use when language varies)
Default Value: English
Simple Description: The primary language of your text content for optimal feature extraction accuracy
When to Change This: Match this to the language of your source text data
Business Impact: Correct language selection improves extraction accuracy by up to 40% and reduces false positives

Select Features

Field Name: selectedFeatures
Type: Multi-select tag box with options:
- Cardinal: Numbers and quantities (e.g., "five", "100", "dozen")
- Date: Dates and time references (e.g., "January 15", "next week", "2024")
- Event: Named events and occasions (e.g., "Christmas", "Super Bowl", "conference")
- Fac: Facilities and buildings (e.g., "airport", "stadium", "hospital")
- Gpe: Countries, cities, states (e.g., "United States", "California", "London")
- Gpe_from: Origin locations in travel or shipping contexts
- Gpe_to: Destination locations in travel or shipping contexts
- Language: Language names (e.g., "English", "Spanish", "Mandarin")
- Law: Legal documents, laws, acts (e.g., "GDPR", "Constitution", "Patent Act")
- Loc: Geographic locations and landmarks (e.g., "Pacific Ocean", "Mount Everest")
- Money: Monetary amounts and currencies (e.g., "$100", "fifty dollars", "€25")
- Norp: Nationalities, religious groups, political groups (e.g., "American", "Buddhist", "Republican")
- Ordinal: Ordinal numbers (e.g., "first", "second", "21st")
- Org: Organizations and companies (e.g., "Microsoft", "UN", "Harvard University")
- Percent: Percentage values (e.g., "25%", "fifty percent")
- Person: People's names (e.g., "John Smith", "Dr. Johnson")
- Product: Products, brands, and services (e.g., "iPhone", "Coca-Cola")
- Quantity: Measurements and quantities (e.g., "5 miles", "two hours", "10 kg")
- Time: Time expressions (e.g., "3 PM", "morning", "midnight")
- WORK_OF_ART: Creative works (e.g., "Mona Lisa", "Star Wars", "Beethoven's 9th")
Default Value: None selected
Simple Description: Choose which types of information you want to extract from your text
When to Change This: Select only the features relevant to your business needs to avoid information overload
Business Impact: Focused feature selection improves processing speed and reduces noise in your extracted data

Output Section

Option

Field Name: outTransformId
Type: Dropdown menu with options:
- Original with appended result column: Keeps all original data and adds extracted features in a new column
- Return result column only: Returns only the extracted features, removing original text data
Default Value: Empty (must be selected)
Simple Description: How you want the extracted features to be formatted in your output data
When to Change This: Choose "Original with appended" to keep source data for reference, or "Result only" for clean feature-focused output
Business Impact: Proper output formatting ensures your downstream processes receive data in the expected structure

Column Name

Field Name: outColumnName
Type: Text field
Default Value: Empty
Simple Description: The name for the new column that will contain your extracted features
When to Change This: Use descriptive names like "extracted_entities" or "customer_mentions" for easy identification
Business Impact: Clear column naming improves data organization and makes results easier to understand for your team

Real-World Use Cases

Customer Feedback Analysis

Business Situation: A retail company receives thousands of customer reviews and wants to automatically identify mentioned products, competitors, and sentiment-related entities.

What You'll Configure:

Set "Column Name" to "review_text" (your review content column)
Choose "English" from the Language dropdown
Select features: Person, Org, Product, Money, Percent
Choose "Original with appended result column" for output option
Name the output column "extracted_entities"

What Happens: The node processes each review and identifies customer names, competitor mentions, product references, prices, and percentage ratings, creating a structured dataset for analysis.

Business Value: Reduces manual review analysis time by 85% and provides consistent entity identification across all customer feedback.

Legal Document Processing

Business Situation: A law firm needs to extract key information from contracts including parties, dates, monetary amounts, and legal references.

What You'll Configure:

Set "Column Name" to "contract_text"
Select "English" as the language
Choose features: Person, Org, Date, Money, Law, Loc
Select "Original with appended result column" to maintain document integrity
Name output column "contract_entities"

What Happens: Each contract is analyzed to identify all parties involved, important dates, financial terms, legal citations, and jurisdictions mentioned.

Business Value: Accelerates contract review process by 60% and ensures no critical information is overlooked during legal analysis.

News Article Monitoring

Business Situation: A PR agency wants to monitor news articles for client mentions, competitor references, and industry events.

What You'll Configure:

Set "Column Name" to "article_content"
Choose "English" for language
Select features: Person, Org, Event, Date, Loc, Money
Use "Return result column only" for focused monitoring data
Name output column "media_mentions"

What Happens: News articles are processed to extract all company names, executive mentions, industry events, dates, locations, and financial figures.

Business Value: Provides comprehensive media monitoring with 95% accuracy, enabling faster response to industry developments and client coverage.

Step-by-Step Configuration

Adding the Node

Drag the Feature Extraction node from the AI Processing section in the left panel
Drop it onto your workflow canvas
Connect it to your data source node using the arrow connector

Configuring Source Data

Click on the Feature Extraction node to open the configuration panel
In the "Source Property" section, click the "Column Name" field
Select or type the name of the column containing your text data
The smart text box will suggest available columns from your connected data

Setting Language and Features

Expand the "Options" section in the configuration panel
Click the "Language" dropdown and select the primary language of your text
In the "Select Features" field, click to open the multi-select box
Check the boxes for each type of information you want to extract
Click "OK" to confirm your feature selections

Configuring Output Format

Expand the "Output" section
Click the "Option" dropdown and choose your preferred output format:
- Select "Original with appended result column" to keep source data
- Select "Return result column only" for extracted features only
In the "Column Name" field, enter a descriptive name for your results column
Click "Save Configuration" to apply your settings

Testing Your Configuration

Click the "Test Configuration" button in the node panel
Enter sample text in the test input field
Review the extracted features in the preview panel
Adjust your feature selections if needed
Save your final configuration

Industry Applications

Healthcare Organizations

Common Challenge: Medical records contain unstructured notes that need analysis for patient care coordination and billing accuracy.

How This Node Helps: Automatically extracts patient names, medical conditions, medications, dates, and healthcare facilities from clinical notes and discharge summaries.

Configuration Recommendations:

Use "English" language setting for most medical records
Select features: Person, Date, Org, Quantity, Product (for medications)
Choose "Original with appended result column" to maintain medical record integrity
Name output column "clinical_entities"

Results: Healthcare providers reduce documentation review time by 70% and improve billing accuracy through consistent entity extraction.

Financial Services

Common Challenge: Processing loan applications, insurance claims, and financial reports requires extracting specific financial and personal information from documents.

How This Node Helps: Identifies applicant names, financial amounts, dates, organizations, and locations from financial documents for automated processing.

Configuration Recommendations:

Select "English" for most financial documents
Choose features: Person, Money, Date, Org, Percent, Loc
Use "Original with appended result column" for audit trail requirements
Name output column "financial_entities"

Results: Financial institutions process applications 50% faster while maintaining compliance and reducing manual data entry errors.

E-commerce Platforms

Common Challenge: Product reviews, customer service tickets, and marketplace listings contain valuable information that needs structured analysis.

How This Node Helps: Extracts product names, brand mentions, prices, customer names, and quality indicators from unstructured e-commerce text data.

Configuration Recommendations:

Use "English" or "Undetermined" for international marketplaces
Select features: Product, Person, Money, Org, Percent, Ordinal
Choose "Return result column only" for clean analytics data
Name output column "commerce_entities"

Results: E-commerce businesses gain 40% better insights into customer sentiment and product performance through automated text analysis.

Best Practices

Feature Selection Strategy

Start Small: Begin with 3-5 essential features and expand based on results
Business Relevance: Only select features that directly support your business objectives
Data Quality: More features don't always mean better results - focus on accuracy over quantity

Language Configuration

Consistency: Ensure your language setting matches your data's primary language
Mixed Content: Use "Undetermined" only when your text contains multiple languages
Regional Variations: English setting works well for US, UK, and other English variants

Output Optimization

Downstream Compatibility: Choose output format based on how you'll use the extracted data
Column Naming: Use consistent, descriptive names that your team will understand
Data Retention: Keep original data when you need to verify extraction accuracy

Performance Considerations

Batch Processing: Process large datasets in smaller batches for optimal performance
Feature Limits: Selecting fewer features improves processing speed
Text Length: Very long documents may require preprocessing to focus on relevant sections

The Feature Extraction node transforms unstructured text into valuable, actionable data that drives better business decisions and automates manual processes across industries.

Feature Extraction Node

What This Node Does​

Configuration Parameters​

Source Property Section​

Options Section​

Output Section​

Real-World Use Cases​

Customer Feedback Analysis​

Legal Document Processing​

News Article Monitoring​

Step-by-Step Configuration​

Adding the Node​

Configuring Source Data​

Setting Language and Features​

Configuring Output Format​

Testing Your Configuration​

Industry Applications​

Healthcare Organizations​

Financial Services​

E-commerce Platforms​

Best Practices​

Feature Selection Strategy​

Language Configuration​

Output Optimization​

Performance Considerations​

What This Node Does

Configuration Parameters

Source Property Section

Options Section

Output Section

Real-World Use Cases

Customer Feedback Analysis

Legal Document Processing

News Article Monitoring

Step-by-Step Configuration

Adding the Node

Configuring Source Data

Setting Language and Features

Configuring Output Format

Testing Your Configuration

Industry Applications

Healthcare Organizations

Financial Services

E-commerce Platforms

Best Practices

Feature Selection Strategy

Language Configuration

Output Optimization

Performance Considerations