Annotating Words and Sentences
On this page we will go over best practices when it comes to Annotations. You can read more about annotations here and see a practical example here.
Engine API
The Engine Scripting API Javadoc documentation describes the method for creating input annotations:
java
1public Annotation createInputAnnotation(String _sName,
2 int _iSentenceIndex,
3 Set<Integer> _zWordIndices,
4 Map<String,Object> _mVariables)
5
This creates a user input Annotation object with the given data.
The created object is not added to the current annotations - this needs to be done by passing the object to getInputAnnotations().add(...)
Parameter Name | Description |
---|---|
_sName | The name of the annotation |
_iSentenceIndex | The index of the sentence this annotation is assigned to (the first sentence has index 0) |
_zWordIndices | The indices of the sentence words this annotation is assigned to (the first word has index 0, the set may be empty) |
_mVariables | An arbitrary collection of key/value pairs; if null is passed then an empty map is used |
Applying Annotations
Annotations can be applied to 0-n words within a single sentence.
Sentence Annotation
Applying an annotation to 0 words in a sentence annotates the entire sentence - but none of the words within it (see "Used Words" below).
Use an empty set ([] as HashSet
) to annotate an entire sentence:
_.createInputAnnotation("EXAMPLE.SENTENCE", 0, [] as HashSet, [:])
Example
Code:
java
1_.getSentences().eachWithIndex{s, sentenceIndex ->
2 def some_variable_value = ["annotationType": "This is a sentence annotation"]
3 _.getInputAnnotations().add(
4 _.createInputAnnotation("EXAMPLE.SENTENCE", sentenceIndex, [] as HashSet, ["some_variable_name": some_variable_value, "sentenceIndex": sentenceIndex])
5 )
6}
7
Single Word Annotation
Use a set with a single word index entry ([0] as HashSet
) to annotate a single word: _.createInputAnnotation("EXAMPLE.WORD", 0, [0] as HashSet, [:])
Example
Code:
java
1_.getSentences().eachWithIndex{sentence, sentenceIndex ->
2 def some_variable_value = ["annotationType": "This is a word annotation"]
3
4 sentence.getWords().eachWithIndex{word, wordIndex ->
5 _.getInputAnnotations().add(
6 _.createInputAnnotation("EXAMPLE.WORD", sentenceIndex, [wordIndex] as HashSet,
7 ["some_variable_name": some_variable_value, "sentenceIndex": sentenceIndex, "wordIndex": wordIndex])
8 )
9 }
10}
11
Multi Word Annotation
Use a set with multiple word index entries ([0, 1, 2, 7] as HashSet
) to annotate multiple words: _.createInputAnnotation("EXAMPLE.ALL_WORDS", 0, [0, 1, 2, 7] as HashSet, [:])
Example
Here we annotate all words in each sentence:
Code:
java
1_.getSentences().eachWithIndex{sentence, sentenceIndex ->
2 def some_variable_value = ["annotationType": "This is an all-words annotation"]
3
4 def allWordIndices = sentence.getWords().withIndex().collect { word, wordIndex -> wordIndex }
5 _.getInputAnnotations().add(
6 _.createInputAnnotation("EXAMPLE.ALL_WORDS",
7 sentenceIndex,
8 allWordIndices as HashSet,
9 ["some_variable_name": some_variable_value, "sentenceIndex": sentenceIndex, "allWordIndices": allWordIndices ])
10 )
11}
12
Multiple Annotations per Word(s)
Looking at the latest example screenshot in more detail you can see also how a single word - or collection of words can be assigned multiple annotations:
For example the "annotation" here has been annotated with 4 different annotations:
Annotation Name | Type | Source |
---|---|---|
EN.LANG | Sentence | System (Language Detector) |
NN.POS | Single word | System (POS Tagger) |
PL.POS | Single word | System (POS Tagger) |
EXAMPLE.ALL_WORDS | Multi word | Listener in Solution |
Annotating Multiple Possibilities
In cases where different annotations could apply to different sets of words in the same input, the different combinations can each be annotated.
Example: CITY_MATCH
For example, consider a solution containing a city matching annotator, with an input containing many possible city matches. Both "new york" and "york" are possible values for mentioned cities, hence both have been annotated as such:
With a variable containing the full detail of the match:
Similarly, where a word (or combination of words) has a number of possible matches, these matches can be added to the annotation variable value to be used in the matching later for example to achieve automatic - or user driven - disambiguation:
Used Words
The "used words" of an annotation are the words that the annotation is connected to. They are controlled by the _zWordIndices
parameter of the _.createInputAnnotation
function. In the above "annotation" example (in "Multiple Annotations per Word(s)") the used words are as follows:
Annotation Name | Type | Used Words |
---|---|---|
EN.LANG | Sentence | (none - sentence annotations do not have used words) |
NN.POS | Single word | "annotations" |
PL.POS | Single word | "annotations" |
EXAMPLE.ALL_WORDS | Multi word | "annotations", "can", "be", ... (all words annotated) |
These used words then define the matching behavior when using:
- Positional operators, e.g.:
>>
(directly followed by) - Extended And operators, e.g.:
&=
(same match).
The used words also define the value returned by _.getUsedWords()
(and similar) from within a predicate script attached to an annotation:
%$EXAMPLE.ALL_WORDS^{myVar = _.getUsedWords()}
.
Conditioning
Using the CITY_MATCH
examples from above:
The annotations can be used in a number of ways within an NLU condition via attached scripts.
Predicate Script
Control matching based on the data associated with the matching annotation using a predicate script attached to an annotation
For example: to match only when the the city has been matched with > 90% confidence, we can use %$CITY_MATCH:{lob.distance > 90}
.
Propagation Script - Extraction
Pass data along with the annotation and extract that data to use within the flow using a propagation script attached to an annotation.
For example, to extract the country for the matched city, we can use %$CITY_MATCH^{matchedCountry = lob.city.country}
.
Positional
Match when the user mentions a city in a particular context.
For example, the user wants to go to a city: to >> %$CITY_MATCH
.
Disambiguation Example
As an example, disambiguate when the destination city is not clear from annotation:
When match distance is less than 90Get all possibilities...
tlml
1to >> (%$CITY_MATCH:{lob.distance < 90} == match distance is less than 90 ==
2 ^{matchWords = _.getUsedWords(_.ORIGINAL)} == extract the matched words - for user prompt ==
3 &= %$CITY_ALL_MATCHES^{matchCandidates = lob.matches.match} == and extract all possible sities for the same used words ==
4 ):L
5
Output:
output
1I am not sure which city you mean by '${matchWords.join(' ')}'. Did you mean one of these?
2
3${matchCandidates}
4
The following transitions can then match on:
- A city match distance == 100
- Some indexer (1/2/3, first one, second one, last)
- Clickable UI could be added for user choice