Annotating Words and Sentences

On this page we will go over best practices when it comes to Annotations. You can read more about annotations here and see a practical example here.

Engine API

The Engine Scripting API Javadoc documentation describes the method for creating input annotations:

java

1public Annotation createInputAnnotation(String _sName,
2                                        int _iSentenceIndex,
3                                        Set<Integer> _zWordIndices,
4                                        Map<String,Object> _mVariables)
5

This creates a user input Annotation object with the given data.

The created object is not added to the current annotations - this needs to be done by passing the object to getInputAnnotations().add(...)

Parameter Name	Description
`_sName`	The name of the annotation
`_iSentenceIndex`	The index of the sentence this annotation is assigned to (the first sentence has index 0)
`_zWordIndices`	The indices of the sentence words this annotation is assigned to (the first word has index 0, the set may be empty)
`_mVariables`	An arbitrary collection of key/value pairs; if null is passed then an empty map is used

Applying Annotations

Annotations can be applied to 0-n words within a single sentence.

Sentence Annotation

Applying an annotation to 0 words in a sentence annotates the entire sentence - but none of the words within it (see "Used Words" below).

Use an empty set ([] as HashSet) to annotate an entire sentence: _.createInputAnnotation("EXAMPLE.SENTENCE", 0, [] as HashSet, [:])

Example

sentence annotation

Code:

java

1_.getSentences().eachWithIndex{s, sentenceIndex ->
2    def some_variable_value = ["annotationType": "This is a sentence annotation"]
3    _.getInputAnnotations().add(
4        _.createInputAnnotation("EXAMPLE.SENTENCE", sentenceIndex, [] as HashSet, ["some_variable_name": some_variable_value, "sentenceIndex": sentenceIndex])
5    )
6}
7

Single Word Annotation

Use a set with a single word index entry ([0] as HashSet) to annotate a single word: _.createInputAnnotation("EXAMPLE.WORD", 0, [0] as HashSet, [:])

Example

single word annotation

Code:

java

1_.getSentences().eachWithIndex{sentence, sentenceIndex -> 
2    def some_variable_value = ["annotationType": "This is a word annotation"]
3
4    sentence.getWords().eachWithIndex{word, wordIndex -> 
5        _.getInputAnnotations().add(
6            _.createInputAnnotation("EXAMPLE.WORD", sentenceIndex, [wordIndex] as HashSet, 
7                        ["some_variable_name": some_variable_value, "sentenceIndex": sentenceIndex, "wordIndex": wordIndex])
8        )
9    }
10}
11

Multi Word Annotation

Use a set with multiple word index entries ([0, 1, 2, 7] as HashSet) to annotate multiple words: _.createInputAnnotation("EXAMPLE.ALL_WORDS", 0, [0, 1, 2, 7] as HashSet, [:])

Example

Here we annotate all words in each sentence:

multi word annotation

Code:

java

1_.getSentences().eachWithIndex{sentence, sentenceIndex -> 
2    def some_variable_value = ["annotationType": "This is an all-words annotation"]
3
4    def allWordIndices = sentence.getWords().withIndex().collect { word, wordIndex -> wordIndex }
5    _.getInputAnnotations().add(
6        _.createInputAnnotation("EXAMPLE.ALL_WORDS", 
7                       sentenceIndex, 
8                      allWordIndices as HashSet, 
9                      ["some_variable_name": some_variable_value, "sentenceIndex": sentenceIndex, "allWordIndices": allWordIndices ])
10    )
11}
12

Multiple Annotations per Word(s)

Looking at the latest example screenshot in more detail you can see also how a single word - or collection of words can be assigned multiple annotations:

multiple annotations per word

For example the "annotation" here has been annotated with 4 different annotations:

Annotation Name	Type	Source
`EN.LANG`	Sentence	System (Language Detector)
`NN.POS`	Single word	System (POS Tagger)
`PL.POS`	Single word	System (POS Tagger)
`EXAMPLE.ALL_WORDS`	Multi word	Listener in Solution

Annotating Multiple Possibilities

In cases where different annotations could apply to different sets of words in the same input, the different combinations can each be annotated.

Example: CITY_MATCH

For example, consider a solution containing a city matching annotator, with an input containing many possible city matches. Both "new york" and "york" are possible values for mentioned cities, hence both have been annotated as such:

annotating multiple possibilities city match

With a variable containing the full detail of the match:

city match example 2

Similarly, where a word (or combination of words) has a number of possible matches, these matches can be added to the annotation variable value to be used in the matching later for example to achieve automatic - or user driven - disambiguation:

city match 3

Used Words

The "used words" of an annotation are the words that the annotation is connected to. They are controlled by the _zWordIndices parameter of the _.createInputAnnotation function. In the above "annotation" example (in "Multiple Annotations per Word(s)") the used words are as follows:

Annotation Name	Type	Used Words
`EN.LANG`	Sentence	(none - sentence annotations do not have used words)
`NN.POS`	Single word	"annotations"
`PL.POS`	Single word	"annotations"
`EXAMPLE.ALL_WORDS`	Multi word	"annotations", "can", "be", ... (all words annotated)

These used words then define the matching behavior when using:

Positional operators, e.g.: >> (directly followed by)
Extended And operators, e.g.: &= (same match).

The used words also define the value returned by _.getUsedWords() (and similar) from within a predicate script attached to an annotation: %$EXAMPLE.ALL_WORDS^{myVar = _.getUsedWords()}.

Conditioning

Using the CITY_MATCH examples from above:

conditioning city match

The annotations can be used in a number of ways within an NLU condition via attached scripts.

Predicate Script

Control matching based on the data associated with the matching annotation using a predicate script attached to an annotation

For example: to match only when the the city has been matched with > 90% confidence, we can use %$CITY_MATCH:{lob.distance > 90}.

Propagation Script - Extraction

Pass data along with the annotation and extract that data to use within the flow using a propagation script attached to an annotation.

For example, to extract the country for the matched city, we can use %$CITY_MATCH^{matchedCountry = lob.city.country}.

This assumes a flow or global variable called matchedCountry has been defined.

Positional

Match when the user mentions a city in a particular context.

For example, the user wants to go to a city: to >> %$CITY_MATCH.

Disambiguation Example

As an example, disambiguate when the destination city is not clear from annotation:

When match distance is less than 90Get all possibilities...

tlml

1to >> (%$CITY_MATCH:{lob.distance < 90}                          == match distance is less than 90 ==
2        ^{matchWords = _.getUsedWords(_.ORIGINAL)}               == extract the matched words - for user prompt ==
3    &= %$CITY_ALL_MATCHES^{matchCandidates = lob.matches.match}  == and extract all possible sities for the same used words ==
4    ):L
5

This assumes flow or global variables called matchCandidates and matchWords have been defined.

Output:

output

1I am not sure which city you mean by '${matchWords.join(' ')}'. Did you mean one of these?
2
3${matchCandidates}
4

disambiguation example output

The following transitions can then match on:

A city match distance == 100
Some indexer (1/2/3, first one, second one, last)
Clickable UI could be added for user choice