Teneo Developers

NLU enhancement

Introduction

The NLU Enhancement Package is a set of features which improves and simplify the Natural Language Understanding (NLU) part of the Teneo Platform. The package addresses the needs of extracting information from user inputs within the Platform and within the TLML syntax matching, and furthermore opens the possibility to have more complex objects as variable values.

The package is a set of features designed to:

  • make it easier to extract information from inputs,
  • provide users with more control over what parts of the inputs are extracted and in which way,
  • reduce the need for listeners and scripting (by supporting these needs inline syntaxes), which means:
    • more performant solutions at run-time, and
    • smaller solutions.

The NLU Enhancement Package covers two concepts:

  1. NLU Variables WHAT is propagated, and
  2. Propagation scripts HOW it is propagated.

NLU Variables

The NLU Variables are a type of variables for the Language Object and Entity level. They serve the same purpose as Language Object variables, but are more powerful and flexible as they provide the users with a more fine-grained control over data extraction from within the linguistic rules allowing users to:

  • reliably recognize multiple objects of the same type within the same input, and
  • ensure the correct data is extracted from sentences.

The default value of an NLU variable is a script (as opposed to a Language Object variable, where the default value is a string).

Each Language Object (and Entity) can have a list of NLU Variables in addition to its Language Object variables: the list holds the NLU Variables defined for the Language Object together with a default value for each variable. If the default value is empty, the script will evaluate to null (just like flow or session variables).

In the Teneo Studio interface, variables (both NLU Variables and Language Object variables) are available in the frontstage of Language Objects, making them visible next to the Syntax editor.

Language Object Variables and NLU Variable

In Entities, the NLU variables are added to the entries table in the column(s) following the Entry column. Each entry in an Entity can have zero, one or several NLU Variables attached and the user can easily toggle between script or string values. If the entry of the Entity contains an Entity or Language Object reference, NLU variables of the referenced object will automatically be set to propagate the variable value when a matching variable is present.

Entity Variables

Propagation scripts

NLU Variables can be accessed from Propagation scripts, which are written inline TLML syntaxes. Propagation scripts address the need of extracting information from user inputs within the Platform and within the syntax matching. They make it possible to have complex objects as values and enables for modularization and visibility, reducing the need of listeners and script nodes. The NLU Variables can be defined where they are needed and the changes which take effect when a specific syntax is matched becomes more visible, since the matching logic and the variable assignment is happening in the same place.

Propagation scripts open for users to attach an action to a specific section of a syntax, allowing that action to only be executed when that specific part of the TLML syntax is involved in the final matching. The syntax essentially holds a piece of Groovy script.

Attach a Propagation script

The Propagation script is attached via a caret symbol (^) to the right of a Language Object or a bracketed expression, followed by { } curly brackets containing the Groovy script.

Example:

tlml

1(%MY_TEST.REC / %TESTING.PHR)^{globalVarTest="Test"}
2

Propagation scripts are executed only after the TLML syntax they are attached to has been evaluated to true by the Teneo Engine, and only the part(s) of the syntax that are actually matched are used for initialization of NLU Variables and execution of propagation scripts.

Top level syntax and Language Object syntax

The Propagation scripts are normal scripts, but the script context is dynamic and depends upon where the script is defined: in top level syntax or Language Object syntax:

In top level syntax (meaning triggers, transitions and flow listeners) the propagation scripts have access to the global variables and the flow variables (except in global listeners) and can, for example, propagate up the used words or the NLU Variable value of Language Objects used in the TLML syntax and save it to a variable. This means that only top level syntaxes can assign values to flow and/or global variables.

In Language Object syntax the propagation scripts have access to NLU Variables declared in that precise Language Object, but they have no access to global and flow variables. Remember that values need to be propagated to the top level syntax if they are to be used in, for example, flows.

In contrary to Language Object variables, where the user needs an input consuming transition to be able to assign a variable, propagation scripts can also be used in non-input consuming transitions for assignation of variables.

Reserved word: lob

In TLML syntax (both top level and Language Object syntaxes), a propagation script can be attached to either a Language Object, an Entity, or to a bracketed syntax. When the propagation script is attached to a Language Object, the script has access to the NLU Variable of that Language Object via the reserved word lob.

An NLU Variable can therefore not be named lob, as it would conflict with the reserved word. This is checked by the Teneo Engine when the solution is loaded, showing an error if an NLU Variable with this name is defined.

The below example takes the value of the NLU Variable NLU_Name (declared in the Language Object TEST.REC) and propagates its value up to the global variable sGlobalName:

tlml

1%TEST.REC^{sGlobalName=lob.NLU_Name}
2

Access rules for Propagation scripts

The access rules for propagations scripts can be found in the TLML Reference Manual.

Optional match option

The Optional match option makes it possible to mark parts of a TLML syntax as optional to extract data which is interesting if it is provided, but not required for the input to be recognized.

The Optional match option is written as :O and can be attached to bracketed synta expressions:

tlml

1%THIS_WAS.MUL >> (%REALLY.ADV.SYN):O >> %GOOD.ADJV.SYN
2

tlml

1(%TRAVEL.VB.SYN &^ (%TO.FW.LEX >> %CITIES.LIST):O &^ (%FROM.FW.LEX >> %CITIES.LIST):O)
2

Note that the Optional match option has an implied longest match in its implementation to ensure that when the Optional match option is applied to a TLML syntax, the match done in Engine always returns the longest possible match; The implicit application of the longest match on operator level when the Optional match option exists doesn't change the deferred longest match behavior also implemented in the Teneo Platform.

Read more about the Optional match option in the TLML Reference Manual.

Engine scripting API

The Teneo Engine scripting API makes it possible to capture the used words both as an array containing all the words and as string. Users are referred to use native methods for checking the size and length of lists and arrays.

The method _.getUsedWords returns the sentence words used to fulfil a syntax as an array. Here used means matches of word syntaxes with words in the sentence. The methods can be passed with the following arguments:

MethodValue
_.getUsedWords(_.ORIGINAL)Returns the words spelled exactly as given in the input text, without any further word processing.
_.getUsedWords(_.SIMPLIFIED)Returns the words of the input text.
_.getUsedWords(_.FINAL)Returns the words of the input text after all preprocessing steps (e.g. simplification, auto-correction, compound splitting, etc.)

When an original sentence word is split by input processors into multiple final words and the given EngineAccess.WordListType is ORIGINAL or SIMPLIFIED, then the original/simplified word will be stored in the index of the first word split if that word is used; for used words at the indices of the following word splits the list will contain null.

When called from an NLU Variable value or a propagation script, all used words-related methods return the words related to the used words of the match of the TLML Syntax part that the script is attached to.

The below table covers the methods deprecated when the NLU Enhancement package was introduced in the Teneo 4.1 Platform release.

Deprecated methods will log a warning message into the Engine log to inform about the deprecation and the replacement method call to use, encouraging users to migrate scripting code to use the new methods.

Deprecated methodNew methodNotes
String[]getUsedRawWords()List<String>getUsedWords()The new method provides identical behavior as the deprecated when calling with 0 arguments. Can be called with _.ORIGINAL, _.SIMPLIFIED, _.FINAL
List<String>getUsedWords (EngineAccess.WordlistType)Allow retrieval of simplified and finalized words. The WordListType argument already exists for the methods getSentenceWords(EngineAccess.WordListType) and getUserInputWords(EngineAccess.WordListType) but the new method is added to give the same degree of control when retrieving used words.
String[]getNotUsedRawWords()List<String>getNotUsedWords()The new method provides identical behavior as the deprecated when called with 0 arguments.
List<String>getNotUsedWords(EngineAccess.WordListType)
Int getUsedRawWordCountThis method is deprecated. Identical behavior can be achieved by retrieving the used words with List<String>getUsedWords() and checking for the length of the list.
Int getNotUsedRawWordCount()This method is deprecated. Identical behavior can be achieved by retrieving the used words with List<String>getNotUsedWords() and checking for length of the list.
getSentenceWordCount()Identical behavior can be achieved by retrieving the sentence words and checking length of the array.
getSentenceWordCount(EngineAccess.WordListType)Identical behavior can be achieved by retrieving the sentence words and checking length of the array.
getUserInputWordCount()Identical behavior can be achieved by retrieving the sentence words and checking length of the array.
getUserInputWordCount(EngineAccess.WordListType)Identical behavior can be achieved by retrieving the sentence words and checking length of the array.

Entities and NLU Variables