Feature Extraction Node - Synthreo Builder

Feature Extraction node for Builder - convert raw text or structured data into numerical feature vectors for use as input to machine learning models and classifiers.

The Feature Extraction node automatically identifies and extracts specific types of information (features) from text data, such as names, dates, locations, organizations, and monetary amounts. This AI-driven node helps businesses structure unstructured text data for analysis, reporting, and automation.

What This Node Does

The Feature Extraction node analyzes text content and identifies meaningful entities like person names, company names, dates, locations, and other important information. It uses advanced natural language processing to understand context and extract relevant data points that would otherwise require manual review.

Business Value: Automatically processes large volumes of text documents, emails, or customer feedback to extract key information, saving hours of manual data entry and ensuring consistent, accurate results.

Configuration Parameters

Source Property Section

Column Name

Field Name: sourcePropName
Type: Smart text field with data suggestions
Default Value: Empty
Simple Description: The name of the column containing the text you want to analyze for feature extraction
When to Change This: Select the specific column from your data that contains the text content you want to process
Business Impact: Choosing the correct source column ensures the node analyzes the right text content and produces accurate results

Options Section

Language

Field Name: languageModel
Type: Dropdown menu with options:
- English - Optimized for English text analysis with highest accuracy
- German - Specialized for German language text processing
- French - Configured for French language content analysis
- Spanish - Tailored for Spanish text feature extraction
- Undetermined - Attempts to auto-detect language (use when language varies)
Default Value: English
Simple Description: The primary language of your text content for optimal feature extraction accuracy
When to Change This: Match this to the language of your source text data
Business Impact: Correct language selection improves extraction accuracy by up to 40% and reduces false positives

Select Features

Field Name: selectedFeatures
Type: Multi-select tag box with options:
- Cardinal - Numbers and quantities (e.g., “five”, “100”, “dozen”)
- Date - Dates and time references (e.g., “January 15”, “next week”, “2024”)
- Event - Named events and occasions (e.g., “Christmas”, “Super Bowl”, “conference”)
- Fac - Facilities and buildings (e.g., “airport”, “stadium”, “hospital”)
- Gpe - Countries, cities, states (e.g., “United States”, “California”, “London”)
- Gpe_from - Origin locations in travel or shipping contexts
- Gpe_to - Destination locations in travel or shipping contexts
- Language - Language names (e.g., “English”, “Spanish”, “Mandarin”)
- Law - Legal documents, laws, acts (e.g., “GDPR”, “Constitution”, “Patent Act”)
- Loc - Geographic locations and landmarks (e.g., “Pacific Ocean”, “Mount Everest”)
- Money - Monetary amounts and currencies (e.g., “$100”, “fifty dollars”, “€25”)
- Norp - Nationalities, religious groups, political groups (e.g., “American”, “Buddhist”, “Republican”)
- Ordinal - Ordinal numbers (e.g., “first”, “second”, “21st”)
- Org - Organizations and companies (e.g., “Microsoft”, “UN”, “Harvard University”)
- Percent - Percentage values (e.g., “25%”, “fifty percent”)
- Person - People’s names (e.g., “John Smith”, “Dr. Johnson”)
- Product - Products, brands, and services (e.g., “iPhone”, “Coca-Cola”)
- Quantity - Measurements and quantities (e.g., “5 miles”, “two hours”, “10 kg”)
- Time - Time expressions (e.g., “3 PM”, “morning”, “midnight”)
- WORK_OF_ART - Creative works (e.g., “Mona Lisa”, “Star Wars”, “Beethoven’s 9th”)
Default Value: None selected
Simple Description: Choose which types of information you want to extract from your text
When to Change This: Select only the features relevant to your business needs to avoid information overload
Business Impact: Focused feature selection improves processing speed and reduces noise in your extracted data

Output Section

Option

Field Name: outTransformId
Type: Dropdown menu with options:
- Original with appended result column - Keeps all original data and adds extracted features in a new column
- Return result column only - Returns only the extracted features, removing original text data
Default Value: Empty (must be selected)
Simple Description: How you want the extracted features to be formatted in your output data
When to Change This: Choose “Original with appended” to keep source data for reference, or “Result only” for clean feature-focused output
Business Impact: Proper output formatting ensures your downstream processes receive data in the expected structure

Column Name

Field Name: outColumnName
Type: Text field
Default Value: Empty
Simple Description: The name for the new column that will contain your extracted features
When to Change This: Use descriptive names like “extracted_entities” or “customer_mentions” for easy identification
Business Impact: Clear column naming improves data organization and makes results easier to understand for your team

Understanding the Output Format

When features are extracted, the result column contains a structured object where each selected feature type becomes a key, and the value is an array of all matched text strings found in the source. For example, extracting Person and Org from a sentence might produce output like the following.

{
  "Person": ["John Smith", "Dr. Johnson"],
  "Org": ["Acme Corp", "Harvard University"],
  "Date": ["January 15", "next week"]
}

When no entities of a given type are found in the source text, the corresponding key will contain an empty array. Downstream nodes can reference these values using standard property path expressions such as extracted_entities.Person[0] to access the first person name found.

Complete Feature Reference

Feature Label	What It Extracts	Example Values
Cardinal	Raw numbers and counted quantities	”five”, “100”, “a dozen”
Date	Calendar dates and relative date expressions	”January 15”, “last Tuesday”, “2025”
Event	Named events and scheduled occasions	”World Cup”, “annual summit”, “Black Friday”
Fac	Named facilities and built structures	”JFK Airport”, “Madison Square Garden”
Gpe	Geo-political entities (countries, cities, states)	“France”, “Chicago”, “Ontario”
Gpe_from	Origin locations in directional context	”from London”, “departing Tokyo”
Gpe_to	Destination locations in directional context	”to Berlin”, “arriving in Sydney”
Language	Human language names	”Mandarin”, “Portuguese”, “Arabic”
Law	Named laws, acts, regulations	”GDPR”, “Sarbanes-Oxley”, “HIPAA”
Loc	Non-GPE geographic features and landmarks	”the Amazon River”, “Mount Fuji”
Money	Monetary amounts with or without currency	”$250”, “fifty euros”, “two million dollars”
Norp	Nationalities, ethnic groups, political affiliations	”Canadian”, “Buddhist”, “Democrat”
Ordinal	Position or rank expressed as ordinal numbers	”third”, “21st”, “last”
Org	Companies, agencies, and institutions	”IBM”, “the United Nations”, “MIT”
Percent	Percentage values and rates	”30%”, “half”, “three-quarters”
Person	Names of real or fictional people	”Marie Curie”, “CEO Jane Doe”
Product	Brand names, product lines, and services	”Tesla Model 3”, “Windows 11”
Quantity	Measurements with units	”10 kilometers”, “500 mg”, “two hours”
Time	Time-of-day expressions	”noon”, “3:45 PM”, “early morning”
WORK_OF_ART	Titles of creative works	”Pride and Prejudice”, “The Beatles”

Real-World Use Cases

Customer Feedback Analysis

Business Situation: A retail company receives thousands of customer reviews and wants to automatically identify mentioned products, competitors, and sentiment-related entities.

What You’ll Configure:

Set “Column Name” to “review_text” (your review content column)
Choose “English” from the Language dropdown
Select features: Person, Org, Product, Money, Percent
Choose “Original with appended result column” for output option
Name the output column “extracted_entities”

What Happens: The node processes each review and identifies customer names, competitor mentions, product references, prices, and percentage ratings, creating a structured dataset for analysis.

Business Value: Reduces manual review analysis time by 85% and provides consistent entity identification across all customer feedback.

Legal Document Processing

Business Situation: A law firm needs to extract key information from contracts including parties, dates, monetary amounts, and legal references.

What You’ll Configure:

Set “Column Name” to “contract_text”
Select “English” as the language
Choose features: Person, Org, Date, Money, Law, Loc
Select “Original with appended result column” to maintain document integrity
Name output column “contract_entities”

What Happens: Each contract is analyzed to identify all parties involved, important dates, financial terms, legal citations, and jurisdictions mentioned.

Business Value: Accelerates contract review process by 60% and ensures no critical information is overlooked during legal analysis.

News Article Monitoring

Business Situation: A PR agency wants to monitor news articles for client mentions, competitor references, and industry events.

What You’ll Configure:

Set “Column Name” to “article_content”
Choose “English” for language
Select features: Person, Org, Event, Date, Loc, Money
Use “Return result column only” for focused monitoring data
Name output column “media_mentions”

What Happens: News articles are processed to extract all company names, executive mentions, industry events, dates, locations, and financial figures.

Business Value: Provides comprehensive media monitoring with 95% accuracy, enabling faster response to industry developments and client coverage.

Step-by-Step Configuration

Adding the Node

Drag the Feature Extraction node from the AI Processing section in the left panel
Drop it onto your workflow canvas
Connect it to your data source node using the arrow connector

Configuring Source Data

Click on the Feature Extraction node to open the configuration panel
In the “Source Property” section, click the “Column Name” field
Select or type the name of the column containing your text data
The smart text box will suggest available columns from your connected data

Setting Language and Features

Expand the “Options” section in the configuration panel
Click the “Language” dropdown and select the primary language of your text
In the “Select Features” field, click to open the multi-select box
Check the boxes for each type of information you want to extract
Click “OK” to confirm your feature selections

Configuring Output Format

Expand the “Output” section
Click the “Option” dropdown and choose your preferred output format:
- Select “Original with appended result column” to keep source data
- Select “Return result column only” for extracted features only
In the “Column Name” field, enter a descriptive name for your results column
Click “Save Configuration” to apply your settings

Testing Your Configuration

Click the “Test Configuration” button in the node panel
Enter sample text in the test input field
Review the extracted features in the preview panel
Adjust your feature selections if needed
Save your final configuration

Industry Applications

Healthcare Organizations

Common Challenge: Medical records contain unstructured notes that need analysis for patient care coordination and billing accuracy.

How This Node Helps: Automatically extracts patient names, medical conditions, medications, dates, and healthcare facilities from clinical notes and discharge summaries.

Configuration Recommendations:

Use “English” language setting for most medical records
Select features: Person, Date, Org, Quantity, Product (for medications)
Choose “Original with appended result column” to maintain medical record integrity
Name output column “clinical_entities”

Results: Healthcare providers reduce documentation review time by 70% and improve billing accuracy through consistent entity extraction.

Financial Services

Common Challenge: Processing loan applications, insurance claims, and financial reports requires extracting specific financial and personal information from documents.

How This Node Helps: Identifies applicant names, financial amounts, dates, organizations, and locations from financial documents for automated processing.

Configuration Recommendations:

Select “English” for most financial documents
Choose features: Person, Money, Date, Org, Percent, Loc
Use “Original with appended result column” for audit trail requirements
Name output column “financial_entities”

Results: Financial institutions process applications 50% faster while maintaining compliance and reducing manual data entry errors.

E-commerce Platforms

Common Challenge: Product reviews, customer service tickets, and marketplace listings contain valuable information that needs structured analysis.

How This Node Helps: Extracts product names, brand mentions, prices, customer names, and quality indicators from unstructured e-commerce text data.

Configuration Recommendations:

Use “English” or “Undetermined” for international marketplaces
Select features: Product, Person, Money, Org, Percent, Ordinal
Choose “Return result column only” for clean analytics data
Name output column “commerce_entities”

Results: E-commerce businesses gain 40% better insights into customer sentiment and product performance through automated text analysis.

Troubleshooting Common Issues

No Entities Extracted

Symptom: The output column is empty or contains only empty arrays
Cause: The source column may not contain text matching the selected feature types, or the wrong language model is selected
Solution: Verify the source column contains the expected text type, confirm the language setting matches your data, and check that the feature types you selected actually appear in sample text

Incorrect Entities Returned

Symptom: The node extracts information that does not match the expected feature type
Cause: Natural language is ambiguous - a word like “Mars” could be a person name, a product, or a location depending on context
Solution: Narrow your feature selection to only the types you need, and post-process the output with a filtering node if specific false positives are consistent

Performance Is Slow on Large Datasets

Symptom: Workflow runs significantly longer when processing more than a few hundred records
Cause: Each record requires a full NLP analysis pass which is compute-intensive
Solution: Select fewer feature types to reduce the analysis scope, and ensure your workflow runs during off-peak hours for large batch jobs

Language Mismatch Errors

Symptom: Entities from the correct types are missed or text is not parsed properly
Cause: The language model does not match the actual language of the source text
Solution: If your dataset contains multiple languages, use the “Undetermined” setting to allow the model to detect language per record

Best Practices

Feature Selection Strategy

Start Small - Begin with 3 to 5 essential features and expand based on results
Business Relevance - Only select features that directly support your business objectives
Data Quality - More features do not always mean better results; focus on accuracy over quantity

Language Configuration

Consistency - Ensure your language setting matches your data’s primary language
Mixed Content - Use “Undetermined” only when your text contains multiple languages
Regional Variations - The English setting works well for US, UK, and other English variants

Output Optimization

Downstream Compatibility - Choose output format based on how you will use the extracted data
Column Naming - Use consistent, descriptive names that your team will understand
Data Retention - Keep original data when you need to verify extraction accuracy

Performance Considerations

Batch Processing - Process large datasets in smaller batches for optimal performance
Feature Limits - Selecting fewer features improves processing speed
Text Length - Very long documents may require preprocessing to focus on relevant sections

Sentiment Analysis - Combine with Feature Extraction to get both entity data and emotional tone from the same text
Similarity - Use extracted entities as input to find similar records across your dataset
Custom Script - Post-process extraction results with JavaScript for custom filtering or reshaping
Convert From JSON - Parse the extracted entity JSON object to access individual entity arrays in downstream nodes

The Feature Extraction node transforms unstructured text into valuable, actionable data that drives better business decisions and automates manual processes across industries.

Feature Extraction Node - Synthreo Builder

What This Node Does

Configuration Parameters

Source Property Section

Options Section

Output Section

Understanding the Output Format

Complete Feature Reference

Real-World Use Cases

Customer Feedback Analysis

Legal Document Processing

News Article Monitoring

Step-by-Step Configuration

Adding the Node

Configuring Source Data

Setting Language and Features

Configuring Output Format

Testing Your Configuration

Industry Applications

Healthcare Organizations

Financial Services

E-commerce Platforms

Troubleshooting Common Issues

No Entities Extracted

Incorrect Entities Returned

Performance Is Slow on Large Datasets

Language Mismatch Errors

Best Practices

Feature Selection Strategy

Language Configuration

Output Optimization

Performance Considerations

ThreoAI

Wingtip

Builder

Pylon

Canopy

MSP Onboarding

Videos

Certification

Feature Extraction Node - Synthreo Builder

What This Node Does

Configuration Parameters

Source Property Section

Options Section

Output Section

Understanding the Output Format

Complete Feature Reference

Real-World Use Cases

Customer Feedback Analysis

Legal Document Processing

News Article Monitoring

Step-by-Step Configuration

Adding the Node

Configuring Source Data

Setting Language and Features

Configuring Output Format

Testing Your Configuration

Industry Applications

Healthcare Organizations

Financial Services

E-commerce Platforms

Troubleshooting Common Issues

No Entities Extracted

Incorrect Entities Returned

Performance Is Slow on Large Datasets

Language Mismatch Errors

Best Practices

Feature Selection Strategy

Language Configuration

Output Optimization

Performance Considerations

Related Nodes

ThreoAI

Wingtip

Builder

Pylon

Canopy

MSP Onboarding

Videos

Certification