Fuzzy Text Matching
Cosine Similarity can be used to find matching items in a list, based on a user input. This class is especially useful for finding matching items in lists that are dynamically populated, which might rule out the use of custom entities. We provide a Groovy class containing several algorithms to find matching items in a list, based on a user input. This class is especially useful for finding matching items in lists that are dynamically populated, which might rule out the use of custom entities.
Installation
Add the file FuzzySearch.groovy to the Resources in your solution and set the path to /script_lib
.
Algorithms
We provide three fuzzy search algorithms: Cosine Similarity (based on n-grams), Edit Distance and Word Count.
Usage
You can call the FuzzySearch class in any script in Teneo Studio, for example in script nodes, in listeners or also as a script condition in transitions. The code can be used like this:
Use Cosine Similarity
groovy
1FuzzySearch.mostSimilarByCosineSimilarity(String pattern, List candidates, double threshold, int degree)
2
The mostSimilarByCosineSimilarity methods have the following arguments:
Argument | Description |
---|---|
pattern | The input string |
candidates | The possible matches |
threshold | The matching threshold, a value between 0 and 1 |
degree | N-gram degree, an integer with default value 2 |
Use Edit Distance:
groovy
1FuzzySearch. mostSimilarByEditDistance(String pattern, List candidates, int threshold, Boolean allowSubstitution)
2
The mostSimilarByEditDistance methods have the following arguments:
Argument | Description |
---|---|
pattern | The input string |
candidates | The possible matches |
threshold | The edit distance threshold, an integer with default value 10 |
allowSubstitution | A Boolean value, if true use Levenshtein distance; if false use LCS distance |
Use Word count:
groovy
1FuzzySearch. mostSimilarByWordCount(String pattern, List candidates, int threshold)
2
The mostSimilarByWordCount methods have the following arguments:
Argument | Description |
---|---|
pattern | The input string |
candidates | The possible matches |
threshold | The matching threshold, an integer with default value 1 |
Results
An ordered list of matching candidates, the contents are different according to the fuzzy search algorithm you choose:
- Cosine Similarity: all candidates whose similarity score with the pattern is greater than the threshold, ordered by closest match first.
- Edit Distance: all candidates whose edit distance is lower than the threshold, ordered by closest match first.
- Word Count: all candidates that have most words in common with the pattern.
Example
Suppose we want to allow someone to use natural language to choose a restaurant from a list of nearby restaurants. Let's say the list of nearby restaurants is retrieved using an API and stored in a variable 'restaurantNames'. To check if an input contains a restaurant name that is in the list using cosine similarity, we can use the following code:
groovy
1def matchingItems = FuzzySearch.mostSimilarByCosineSimilarity(_.userInputText, restaurantNames, 0.40)
2
If the value of 'restaurantNames' was ["Happy Thai", "Delicious Seafood", "Pete's Deli"] and the user input text was Deli, the value of 'matchingItems' would be:
["Pete's Deli", "Delicious Seafood"]
Credits
The CosineSimilarity class was written by Burt Beckwith. The source can be found in Grails core. For more details on the Cosine Similarity algorithm, see Fuzzy Matching with Cosine Similarity
Download
This extension is also availabe in a demo solution that can be downloaded here.
Download the FuzzySearch.groovy file here.