How Ficstar Uses NLP and Cosine Similarity for Accurate Menu Price Matching

Amer
12 minutes ago
10 min read

As Data Analyst at Ficstar, I spend a lot of my time solving one of the toughest problems in web scraping: how to match products and menu items that are listed differently across many online sources. Think about a restaurant meal, it might be called one thing on the restaurant’s website and something slightly different on delivery apps like DoorDash or UberEats. These differences in names, sizes, or descriptions can make it really hard to compare prices accurately and understand what our competitors are doing.

Getting this product matching right is super important, but it’s one of the hardest things to do when we're pulling data from the web. If names aren't matched perfectly, even a small difference can mess up our analysis and lead to bad business decisions.

At Ficstar, we use a mix of three things to get the highest possible accuracy, up to 99.9%. We use Natural Language Processing (NLP), which is like teaching a computer to understand human language, plus some smart statistics, and finally, human checks to make sure everything is right. By combining the speed of machines with the careful eye of people, we make sure every piece of data is reliable.

I’m going to walk you through the key steps of our process. These are the same steps we use to help our clients keep track of prices and stay competitive online.

The Challenge of Matching Names and Sizes

Matching products sounds easy: find two identical items from different sources and link them up. But when you do this with huge amounts of data, it gets very complex.

For example, we pull menu data from a restaurant’s official site and third-party apps like UberEats and Grubhub. The exact same burger might appear with different words:

I might see "Large McChicken Meal" on one site and "McChicken Meal – Large" on another. Sometimes the sandwich is a "Combo" in one place and "À la carte" (sold separately) in another. Even the word order, the tokens (words), or a piece of punctuation can be different.

To fix these problems, we run the text through a series of automated cleanup steps, an advanced matching model, and then a human review. Our goal is to make all the differences disappear so we can be very sure when two items are the same.

Ficstar's 8-Step Process for Menu Item Matching

Step 1: Cleaning Up and Standardizing the Text

The first important step in our successful matching process is text normalization, which is essentially cleaning the text. We start by putting the product name and its size description together into one line of text. Then, we transform it in a few ways:

We change all the text to lowercase.
We remove punctuation and most special characters.
We make unit formats standard (e.g., changing "6 inch" to "6in" or "oz" to "ounces").
We break the text into consistent word patterns, or tokens.

This basic cleanup ensures that simple things like a capital letter, a comma, or a space won't stop our matching process.

Once the text is clean, we use a method called TF-IDF (Term Frequency–Inverse Document Frequency). This turns the product names into numbers based on how often words show up. This helps the system understand which words are important.

For instance, a general word like "meal" might appear often, but a specific word like "combo" is more important for context. Similarly, numbers like "6," "12," or "20" often tell us the size or count, making them critical for an accurate match.

Step 2: Using TF-IDF and Cosine Similarity for Context

Instead of just looking at letter-by-letter differences (which is what simple fuzzy matching does), Ficstar uses a more powerful technique that combines TF-IDF with cosine similarity. This measures how close two product names are in a multi-dimensional space. It's like measuring the angle between two lines to see how similar they are.

As I like to say, "Instead of raw string distance, we’re doing semantic menu similarity." This means the model doesn't just match characters; it understands the meaning and context.

For example:

"Large McChicken Meal" and "McChicken Meal Large" will get a very high similarity score because the model knows they mean the same thing.
"6 inch Italian BMT" and "Italian BMT 6" will also match strongly.
"Combo" and "À la carte" will get a low score because their meanings are different in a menu context.

This focus on context makes our model great at handling different word orders, plurals, and abbreviations—which are very common when pulling data from many different places.

Step 3: Giving More Weight to Important Words

A key part of Ficstar’s method is domain-specific token weighting. We don't treat all words the same. We assign extra importance, or "weight," to words that matter a lot to the business, like words about size or if it's a set meal.

We boost keywords such as:

combo, meal
large, medium, small
footlong, double, single
count indicators (e.g., 3, 6, 10, 20)

By multiplying these weights, we make sure the important attributes stand out. This helps the system tell the difference between similar-looking but non-identical products.

For instance, "McChicken Combo" and "McChicken Sandwich" might look alike to a basic model, but our weighted approach recognizes that "combo" means a full meal set and shouldn't be matched with just a single sandwich. This step significantly cuts down on wrong matches and makes our system more accurate.

Step 4: Using "Blocking" to Reduce Mistakes

Even with our smart NLP model, comparing every product to every other product is slow and full of unnecessary mistakes. To solve this, we use blocking strategies to limit comparisons to logical groups.

Before we run the similarity model, we filter items by things like brand or category. For example, a "McChicken Meal" from McDonald’s will only be compared with other McDonald’s listings, never with a Burger King or Wendy’s item.

This brand-based blocking not only speeds up the process but also makes the overall matching more accurate by keeping irrelevant comparisons out of the running.

Step 5: Scoring and Setting Thresholds

Once potential matches are compared, the system gives each pair a cosine similarity score between 0 and 1. The higher the score, the more similar the items are.

Ficstar sets clear rules for these scores:

Matches above a high confidence threshold (usually above 0.8) are automatically accepted.
Scores in the borderline range (0.5–0.8) are flagged for a manual, human check.
Scores below the lower limit are thrown out completely.

This scoring system ensures that only the most certain matches are automated, and any tricky cases get the human attention they need.

Step 6: The Human Quality Check (QA)

No matter how smart a computer model is, good data still needs human eyes. We include a manual review pipeline as the last step to ensure our data meets the highest standards.

Our human analysts step in when:

The model’s confidence score is too low.
The model finds multiple possible matches for one item.
A "don't match" flag is raised during a quality check.

"Analysts usually review fewer than 10–15% of items," I mention. "Most records are confidently matched by the model, but we always include human verification for borderline cases."

This process is structured: the model suggests matches, the borderline ones go to an analyst, and the analyst approves or rejects them. Approved pairs are added to a "gold-standard" dataset that we use to teach the model for future matching.

This approach—combining the efficiency of AI with the precision of human oversight, is a core principle of how we do things at Ficstar.

Step 7: Continuous Learning

Every time a human analyst approves or rejects a match, it goes back into the model as a lesson. These approved and rejected pairs are labeled data that we use to retrain the matching algorithm, making it more accurate over time.

This constant feedback loop allows our models to learn and adapt to new ways of naming products, brand-specific patterns, and changes to product lines all on their own. As a result, the system gets smarter, and we need less human help for future data pulls.

Step 8: Accuracy and Real-World Results

All these layers—cleaning, smart modeling, weighting, blocking, and human review, come together to give us truly excellent results.

"Our matching model currently performs in the 90–95% range, depending on how complex the menu or naming is," I explain. "We care more about being precise than automating everything, because for our clients, clean data is the only way to get useful information."

The benefit for our clients is huge. Accurate matching allows them to:

Compare competitor prices with total confidence.
Spot gaps in product lines or assortments.
See menu or catalog updates almost instantly.
Automate pricing analysis with very few errors.

For one big food delivery client, our improved matching accuracy made their pricing analysis much more precise, which directly helped them set better promotions and make more money.

Why Accuracy is Better Than Full Automation

In the world of data, many companies try to automate everything. Ficstar chooses a different path: one that puts data quality and client trust first.

Automating every match might save a few minutes, but it risks tiny errors multiplying across huge datasets. If a single bad match messes up a price comparison or inventory check across thousands of items, the cost of that bad data quickly becomes much higher than the time saved by going faster.

By using a hybrid approach, driven by algorithms but reviewed by humans, Ficstar ensures our data products are both scalable (can handle huge amounts of data) and reliable (can be trusted).

Lessons from the Field: The Restaurant Example

Let me give you a clear example. Let’s say we’re pulling menu data for a major fast-food chain. The same meal could be listed like this:

Source	Product Name	Notes
McDonald’s Official Site	McChicken Meal (Large)	Includes fries and drink
DoorDash	Large McChicken Combo	Different word order
UberEats	McChicken Large Meal	Slight order variation

Without our cleanup process, these three look like different items. But with Ficstar's pipeline, the token analysis, size weighting, and cosine similarity all recognize them as the same product. The final, unified output looks like this:

Unified Output: McChicken Meal – Large (Combo)

This consistency means that later analysis systems can treat it as one product, allowing for accurate price comparisons between all the delivery platforms.

The Role of QA in Getting Better

Every human review we do helps our system learn and improve. Our own performance reports show that focusing on quality assurance (QA) leads directly to better results for our clients and fewer issues over time.

For example, the number of mismatches flagged during our internal checks dropped by nearly half in 2025. This improvement came from fine-tuning our QA review process and using the "gold-standard" data from analyst feedback to continuously retrain our models.

The strength of our process is in its balance. It’s not a machine doing all the work, nor is it a person doing all the work; it’s a smart collaboration:

Automation ensures we can handle scale. The TF-IDF + cosine similarity engine handles thousands of records quickly.
Human review ensures the data is credible. Analysts check the hard-to-call cases, stopping errors before they spread.
Feedback loops ensure we keep learning. Every review makes the model better for next time.

Looking Ahead

As AI gets more advanced, we are looking at new ways to improve matching using complex language models (like BERT or RoBERTa). These models can understand even deeper connections between words.

However, I want to emphasize that our focus will always be on controlled accuracy, not just blind automation.

"AI can give us more speed and scale, but our clients rely on precision," I say. "That’s why the human layer will always be part of our process."

The future will bring smarter models, but the basic rule stays the same: the highest value comes from data that clients can truly trust.

Key Takeaways

Matching product names and sizes is a lot more than just a technical job, it’s the essential step that turns raw web data into smart business decisions. At Ficstar, hitting a 90–95% accuracy rate isn't a one-time success; it's an ongoing effort powered by machine learning, human expertise, and non-stop quality checks.

Using TF-IDF, cosine similarity, weighted tokens, smart blocking, and a structured human review process, we change messy web data into clean, reliable insights.

For me and the team at Ficstar, this process shows our core belief: accuracy is not a nice-to-have, it’s the absolute foundation of everything we do.

Why is product and menu item matching such a challenge for data teams?

Menu items are often listed differently across platforms, for example, “Large McChicken Meal,” “McChicken Meal Large,” or “McChicken Combo.” These inconsistencies may look minor, but they create unreliable pricing analytics and make it difficult to compare products or detect competitive trends.

How does Ficstar solve this challenge?

Ficstar uses an NLP-based data matching pipeline that combines text normalization, token weighting, and semantic similarity scoring to identify equivalent items across multiple data sources. This allows systems to recognize that two differently worded products actually refer to the same menu item.

What are the core techniques used in Ficstar’s data matching process?

Ficstar’s model integrates several key techniques to ensure semantic accuracy:

TF-IDF Vectorization: Converts text into numerical representations to capture word importance and frequency.
Cosine Similarity: Measures how closely two product names are related in meaning, not just spelling.
Domain-Specific Weighting: Boosts key tokens such as combo, large, or footlong to highlight important menu attributes.
Blocking Strategies: Limits comparisons by brand or category to reduce unnecessary matches and computation time.

What role does human quality assurance (QA) play in the process?

Even with strong automation, some matches fall below confidence thresholds or return multiple candidates. In these cases, Ficstar’s analysts perform a manual review. Approved results are stored as gold-standard data, which helps retrain and improve the model. This human-in-the-loop approach ensures that every dataset reaches enterprise-grade reliability.

How accurate is Ficstar’s data matching pipeline?

Ficstar’s hybrid approach achieves 95–100% accuracy, depending on the complexity of menu structures and naming conventions. The remaining cases are refined through human QA, ensuring that no critical mismatches reach the client’s final dataset.

How does the model improve over time?

Each manual review contributes to continuous improvement. The system learns from approved and rejected matches, retraining itself to recognize similar patterns in future datasets. This feedback loop steadily reduces manual workload and increases automation accuracy.

What is the business impact of accurate product matching?

Reliable data matching allows enterprises to:

Conduct precise competitive pricing analysis
Maintain consistent menu and assortment monitoring
Improve decision-making based on clean, trusted data
Reduce reporting errors and improve time-to-insight for analytics teams