Annotating Inputs

Annotations are a feature that generates additional information or an extra data layer which can be attached to a word, a sentence or an entire user input text. The annotation layer enriches the user input with further details which can then be used to enhance the standard engine matching process.

Data layer

An annotation represents an additional data layer associated with words or sentences in the user input, e.g., data generated by the Named Entity Recognizer or a Part-of-Speech tagger. An annotation has a name and is associated with a sentence via the sentence's index in the list of sentences and holds a set of indices of the word or words in the sentence to which it is associated. I.e., an annotation is assigned to a sentence from a user input, and then to one or more words in that sentence and, therefore, a word or sentence may carry one or multiple annotations.

Optionally, annotations also store a map of variables (property key and value pairs), which enrich their attached objects with further details; these variables can then be used to pass data about the match to be used in scripting to provide better natural language understanding, enhancing the standard engine matching process.

One can think of annotations as dynamic language objects that are generated at runtime. Compared to normal, static language objects which always exists and require a TLML syntax to be fulfilled in order to match, annotations are their own syntax and they only match when they exist.

Types of annotations

The Teneo Engine provides basic annotations, by default, with information regarding the user input, such as the start of a dialogue or the end of a session. The Teneo Input Processors generate annotations if, for example, the user input is empty or if it contains certain characters (like exclamation point or question mark), or combination of characters. Depending on the active language configuration, the Input Processors may also produce annotations for numbers, Part-of-Speech (POS) and morphological information, etc.

In addition to this, custom annotations can be created as required for a particular project; these custom annotation can be generated from within a custom input processor, a Pre-matching script or a Global Pre-listener script in the Teneo solution.

Why annotations?

Annotations are generated via Input Processors and/or solution scripting and the power of the annotations come from exactly how they are created as the data they can provide goes beyond what is achievable with standard TLML rule-based engine matching against words, Entities and Language Objects.

The annotations can be generated based on context, sentence structure, user specific configurations or more complex machine learning models, etc., and can then be used in the matching process. As a result, annotations contain brand-new data which makes it possible to create syntaxes which more precisely identify sentence elements that are necessary for interpreting the input correctly.

Annotations as java classes

The annotations are represented as java classes that store information about the word they have labelled and they have the following fields:

Name
Position of the sentence where the annotation is added; the first sentence in the input is 0
Position(s) of the word (or words) in the sentence where the annotation has been added; the first word in each sentence is at position 0; an annotation can be applied to multiple words
Annotation variables as key-value pair; the key is a string while the value can be either a string, a number, or an object (these are optional and depend on the information the annotation needs to store).

The annotation key is made from the first three fields; see Annotation methods further below for more details.

Collections of annotations

The Teneo Engine, by default and independent of solution language, provides basic annotations related to whole dialogues (and not individual inputs) while the System Annotation Input Processor (IP) provides annotations related to, for example, empty inputs, if question marks or quotes were present in the user input or if nonsense text or binary characters were detected.

In addition to the above, and in this case dependent on the solution language, more annotations are generated by other Teneo Input Processors, for example, annotations related to intent classification, basic number recognition, language recognition or even annotations related to Part-of-Speech tagging and morphological analysis.

Then, last but not least, project-specific custom annotations can be created for a particular project by generating the annotation directly in the solution in a Pre-matching script or a Global Pre-listener.

The following sub-sections introduce the various collections of annotations available with Teneo as well as custom annotations.

System and Standard annotations

The Teneo Platform bundles two default collections of annotations in all language configurations: System annotations added by the Teneo Engine and Standard annotations added by the System Annotation Input Processor.

System annotations

Two special annotations are set by the Teneo Engine itself, they are related not to individual inputs but to whole dialogues and are dependent on the session state.

Annotation	Description
_INIT	Indicates that a session has started, i.e., the first input in any new dialogue
_TIMEOUT	Indicates that a previously timed out session/dialogue has restarted

Standard annotations

The Standard annotations are set by the System Annotation Input Processor and, regardless of the language configuration, the following annotations are set:

Annotation	Description
_QUESTION	A question mark (?) appears in the input
_EXCLAMATION	A exclamation point (!) appears in the input
_DBLQUOTE	A quotation mark (") appears in the input
_QUOTE	Single quotation marks (‘) appear in the input
_BRACKETPAIR	A pair of brackets ( ), [ ] or { } appear in the input
_NONSENSE	The input contains nonsense text (such as ‘asdf’, ‘wgwwgwg’, ‘xxxxxx’)
_EMPTY	The input contains no text
_BINARY	The input consists of only 0s and 1s
_QT3	Triple question marks (???) appear in the input
_EM3	Triple exclamation marks (!!!) appear in the input

Annotations from other Input Processors

In addition to the above mentioned annotations, depending on the solution language configuration, more Teneo Input Processors (IPs) are available generating additional annotations.

Number annotations

Standard, Korean and Turkish Input Processor chains

The Standard Input Processor chain, as well as the chains for Korean and Turkish, include the Basic Number Recognizer Input Processor. This Input Processor identifies all Arabic numbers of the type 123 and 3.14 in the user input, annotates each of them with an annotation and associates a variable to this annotation which holds the number found. Although this Input Processor is language independent, each language has its own configuration file that defines which is the decimal point character and the thousands separator character to be ignored.

Annotation	Variable	Description
NUMBER	numericValue	Arabic numbers of the type 123 and 3.14, the associated variable numericValue stores the detected number

Chinese Input Processor chain

The Chinese Numbers Input Processor is further advanced when generating number annotations, annotating all numbers as well as numerical expressions. It first normalizes tokens containing numeric values into Hindu-Arabic numerals, then creates an annotations with a variable containing the normalized number, as well as generating a second annotation with the name of the normalized number value (e.g., 3.14 is annotated as %$3.14 whereas 3,14 is annotated as %$314). The Chinese Number IP also annotates inexact numbers, i.e., numbers containing characters 几 or 数 or 余 or 多.

Annotation	Variable	Description
NUMBER	numericValue	All numbers as well as numerical expressions; the associated variable stores the normalized number
INEXACT		Inexact numbers are annotated with INEXACT; i.e., numbers containing characters 几 or 数 or 余 or 多

To read more, please see the Chinese Numbers Input Processor.

Japanese Input Processor chain

The Japanese Number Recognizer Input Processor is capable of recognizing various types of number expressions, which are then annotated with a number annotation and associated with a variable which holds the numeric value of the found number.

Annotation	Variable	Description
NUMBER	numericValue	Covers various types of number expressions, the associated variable stores the normalized number

To read more about the Input Processor, visit the section Japanese Input Processors chain.

Predict annotations

The Predict Input Processor makes use of a machine learning model generated when classes are available in a Teneo Studio solution to annotate user inputs with the defined classes. Models can be generated either with Teneo Learn or CLU; note that as of Teneo 7.3, deferred intent classification is applied.
Whenever the Predict Input Processor receives an input, the Input Processor calculates a confidence score for each of the classes based on the model, creating annotations for the most confident class and for each other class that matches the following criteria:

the confidence is above the minimum confidence (defaults to 0.01)
the confidence is higher than 0.5 times the confidence value of the top class.

Teneo Predict will create a maximum of 5 annotations, regardless of how many classes match the criteria.

Annotation	Variable	Variable	Variable	Description
<CLASS_NAME>.TOP_INTENT	classifier	confidence		Annotation created for the class with the highest confidence score
<CLASS_NAME>.INTENT	classifier	confidence	Order	Annotation given to each selected class with a maximum of five top classes

Read more about Intent Classification in Teneo or visit the Teneo Predict section to read more about the Input Processor.

Part-of-Speech and Morphological annotations

Depending on the language configuration, the Teneo Input Processors may also set annotations carrying Part-Of-Speech and morphological information.

The POS-tagger / Morphological Analyzer creates annotations for each word in the user input with names like NN.POS, VB.POS, PAST.POS, PRESENT.POS, etc. that help to distinguish whether, for example, the word is a noun or verb. Furthermore, the Analyzer also provides annotations which indicate whether a noun is in singular or plural, or whether a verb was in the present, in the past, in the 3rd person, an imperative, etc.

The sets of annotations related to Part-of-Speech and Morphology are language specific, for more information and availability in a specific language, please see refer to the POS Tagger and Morphological Analyzer.

Language annotations

The Teneo Input Processor chains include a Language Detector Input Processor which uses a machine learned model to generate an annotation for the predicted language of the user input alongside a confidence score of the prediction. The value of the confidence score reflects the probability of a tag being correct and ranges from 0 (lowest probability) to 1 (highest probability).

Annotation	Variable	Description
<language label>.LANG	confidence	The Language Detector generates an annotation for the predicted language of the user input, and the associated variable contains the confidence score (reflecting the probability of the tag being correct)

Read more about the Language Detector in the Standard Input Processor chain, or visit the NLP Capabilities section to select the wanted Input Processor chain in the menu.

Named Entity annotations

For several languages, the Teneo Input Processors also create annotations for entities detected in user inputs, such as location, organizations, products, etc. Annotations are also created for entities that might carry Personal Identifiable Information (PII), such as names, addresses, unique identifiers, e-mail addresses, etc.

Read more about the Named Entity Recognizer.

Custom annotations

Users can create their own custom annotations which opens up for the possibility to enrich user inputs with an extra layer of information tailored for a specific project and, for example, label postal codes or phone numbers based on given patterns, use named-entity recognizers to find products, places or names, or tagging the user inputs based on a machine learning classifier, all depending on the specific needs for the given project.
The custom annotations can be created in the following places:

Pre-matching scripts
Global Pre(-matching) Listeners

Both Pre-matching scripts and Global Pre Listeners allows to create custom annotations directly in the solution in Teneo Studio, this is done by calling the annotation-related Teneo Engine API methods which allow for an easy and comprehensive use of the annotator capabilities. See an introduction to this in the following section.

The above image displays an example of a custom annotation created in the Global Pre-matching script of a solution which generates the annotation START and attach it to the first word in the input sentence (if the input is not empty).

As mentioned above, it is also possible to script annotations from Global Pre(-matching) Listeners. Below is exampled the implementation of the FAVORITE_COLOR annotation which is attached to the mentioned word if it, first, matches the TLML syntax of the Language Object COLORS.LIST, secondly if it appears as the first word in the input and, third, if the Language Object SENTIMENT_POSITIVE.INDICATOR is also matched (indicating the user input talks positively about the mentioned color).

Once created, this annotation can be used for further syntax matching elsewhere in the solution, for example in a Flow as visualized below, where the TLML Syntax Match of the Flow trigger uses the annotation for the trigger matching.

And of course, be tested in Tryout, where the user is able to see the answer of the Flow (to the left) and the created annotation (to the right) in the Annotations view under the Input section of the Tryout window (read more about annotations in Tryout further below).

As Global Pre Listeners' execution is sequentially following a defined order, one action performed by a Listener (like removing an annotation) might affect the syntax matching of the next one!

Annotation methods

This section provides an introduction to the method for creating a new annotations as well as a brief overview of how to update or remove an annotation, for more details please see the Teneo Engine Scripting API:

engineAccess: provides access to the state and functionalities of the Teneo Engine, including input annotations, for example, createInputAnnotation
Annotation Class method: find here the available methods for the class Annotation
AnnotationsI Interface methods: a collection which contains Annotation objects; accessible through the syntax .getInputAnnotations (), e.g., _.getInputAnnotations().add(annotation)

Create input annotation

The following method creates a new annotation instance for the given data:

groovy

1Annotation createInputAnnotation(String _sName,
2			int _iSentenceIndex,
3			Set<Integer> _zWordIndices,
4			Map<String,Object> _mVariables)
5

The annotation parameters are:

_sName: the name of the annotation which must follow the same naming conventions as Language Objects (i.e., names must be uppercased, no whitespace or other reserved characters are allowed)
_iSentenceIndex: the index in the user input's List<SentenceI> to which this annotation belongs (the first sentence has index 0)
_zWordIndices: the indices in the SentenceI's List<WordData> to which this annotation belongs (the first word has index 0)
_mVariables: an arbitrary collection of key/value pairs; pass null if no variables are required.

Passing a value for the parameter _mVariables is optional; the annotation can also be created by _.createAnnotation(_sName, _iSentenceIndex, zWordIndices, null).

The method may throw the following exceptions:

NullPointerException: if the name is null or the word indices map is null
IllegalArgumentException: if the name is empty, the sentence index is negative or not less than the number of sentences, the word indices map contains a negative index or an index not less than the word count of the selected sentence, or the variables map contains a null key.

Manage annotations

Annotations can be added, updated or removed in Pre-matching scripts and by scripting Global Pre Listeners.

For example, given the object testAnnotation, created as:

groovy

1def testAnnotation = _.createInputAnnotation("Test", 0, [] as Set, null)
2

test_annotation can be added:

groovy

1_.getInputAnnotations().add(testAnnotation)
2

Or updated the same way (note that "add" will overwrite if the added annotation already exists):

groovy

1_.getInputAnnotations().add(_.createInputAnnotation("TEST" 0, [] as Set, ["new": "yes"]))
2

Or removed:

groovy

1_.getInputAnnotations().remove(testAnnotation)
2

Other methods allow to delete all annotations, for example:

groovy

1_.getInputAnnotations().clear()
2

Annotations in TLML syntax

Annotations can be used anywhere within a solution where it is possible to use a Language Object or Entity applying the Teneo Linguistic Modeling Language syntax; the annotations are, in addition to the % (percentage) sign, also prefixed by a $ (dollar) sign. For an annotation to be fulfilled, an annotation with the same name given in the syntax must exist on the sentence itself or in one or more sentence words.

As an example, the syntax in the below image matches if the input sentence contains any word annotated as a noun, directly followed by any word annotated as a verb, directly followed by any word annotated as a pronoun.

However, when working with annotations, it often make sense to use the annotations together with the Extended And operators and their negative equivalents, as all of these are used-word based. With the help of these operators, it is possible to write TLML syntax on both traditional Language Objects, Entities and on attributes from the annotation layer on the same used word.

Extended And operators	Negated And operators
&= Same Match operator	!&= Not Same Match operator
&> Bigger Match operator	!&> Not Bigger Match operator
&< Smaller Match operator	!&< Not Smaller Match operator
&~ Overlap Match operator	!&~ Not Overlap Match operator
&^ Different Match operator	!&^ Not Different Match operator

Read more about these operators in the Teneo Linguistic Modeling Language Manual.

Annotation variables

Annotation variables are accessible within a syntax in the same way as either NLU variables or Language Object variables.

Since annotation variables can be of any type, their values need to be converted according if saved to other variable types as exampled in the below image.

Annotation variables can also be accessed via scripting; in the below example script, any annotation having a name ending with .POS and where the variable confidence has a value of 0.5 or lower is removed.

Tryout

In Teneo Studio Desktop, the visualization for tracking annotations is available in the Input section of the advanced Tryout window, where the Annotations view summarizes information concerning the annotations managed during the input processing; hovering over the different annotations provide more information about them, such as a more detailed description or information related to variables and values. The view also highlights if an annotation is updated or deleted.

Tryout

The Input Processor Results view (in the Input section of the Tryout window) displays more information regarding the annotations created by Input Processors, including whether they were added, deleted or modified, their values, etc.

The information in the Tryout related to annotations is also included in the text / CSV exports available by right-clicking and selecting Open As Text / CSV or Copy as Text / CSV.