Teneo Developers

NLU Generation

The Natural Language Understanding (NLU) Generation is a functionality in the Teneo Platform that allow users to automatically draft suggestions of Teneo Linguistic Modeling Language (TLML) syntax in triggers and transitions when implementing TLML Syntax Matches.

The drafted suggestion is based on the set of positive and negative examples of User Intent available in the trigger or transition, on Language Objects (LOBs) and Entities and takes into account information from Part-of-Speech (POS) taggers and NERs (when available).

Based on both positive and negative User Intents, the NLU Generator:

  • chooses the best TLML syntax at the very end of the process, based on a wide range of criteria, minimizing the risk that the optimal alternative gets discarded early in the process only because it doesn't seem like the best one in a very local context.
  • uses the Different Match (&^) operator, enabling better User Intent coverage without having to ignore the Language Objects that stretch across words they do not match; thanks to this Engine operator, the NLU Generator do not have to choose between either more reliable Language Objects (because they match more words in an example) or good example coverage (where all relevant words are used in the TLML syntax, without any long-matches stretching over them), it can do both!
  • generates many alternative syntaxes for each User Intent and then the NLU Generator waits until the very end before selecting the final syntax. Then it judges the intended scope of words and phrases in the User Intents, given by the resulting Language Object and Entity selection, zeroing in on the best result.

Positive User Intents

To take full advantages of the NLU Generator, it is recommended to add more than one positive User Intent, a maximum of 15 and to keep each User Intent shorter than 35 words/tokens.

This is recommended because providing more than one positive User Intent allows the NLU Generator to look for synonyms and better phrases and thereby suggest better TLML syntaxes. At the same time, tests have shown that more than 15 positive User Intents as well as User Intents longer than 35 words/tokens affect the performance heavily without improving the syntax quality much when these are many and lengthy.
Please note though that a trigger or transition of course can contain more than 15 positive User Intents when this is useful for running other Teneo functionalities, such as Auto-test, Suggest Ordering, or manual rendering of the TLML syntax.

The NLU Generator generates a large set of syntax suggestions for each of the provided positive User Intents and will, at the very end of the process, select the syntax that covers all meaningful words (non-stopwords) in the User Intents by choosing the longest (covering most words, usually phrase level), the most common (shared by most User Intents), and the most exact (as narrow as possible) Language Objects and Entities.

Negative User Intents

The NLU Generator also makes use of the negative User Intents provided in a trigger or transition. When providing negative User Intents, the NLU Generator will either discard syntaxes that match negative User Intents (if there are alternative syntaxes that don't) or expand the syntax with negations.
The negative User Intents only have influence on the syntax generation if they match the syntax generated for the positive User Intents. If a negative User Intent doesn't match the syntax, this means that everything is as it should be (the Auto-test would not fail for that trigger, for example) and the syntax would remain the same as it was before the negative User Intent was added.

Selected LOBs and Entities

When drafting TLML syntax, the NLU Generator's algorithm uses Language Objects / Entities from lexical resources assigned to the solution, such as the Teneo NLU Ontology and Semantic Network's Lexical Resources, and any project-specific Language Objects / Entities located in the solution if these follow the naming conventions of the Teneo NLU Ontology and Semantic Networks.

The Teneo Lexical Resources (TLRs) contain different types of hierarchically structured Language Objects and the Teneo Engine has no way of discriminating between those types, i.e. for the Teneo Engine they are just "Language Objects", the NLU Generator, however, is designed to depend on this structure.

The NLU Generator's algorithm relies on all the pieces of information contained in the Language Objects' names to select the most appropriate Language Object in each context and only Language Objects of the type LEX, MIX, MUL, SYN and PHR are used in the generated TLML syntax.
Entities, on the other hand, are always preferred by the NLU Generator over any Language Object, except PHR, which is preferred over any Entity in the selection of objects for the drafted TLML syntax.

Any other type of Language Object (which is not mentioned above) willnot be used in the automatically drafted syntax.

If no fitting Language Object or Entity is found, the NLU Generator will use the bare word itself in the syntax.

Further descriptions of each of the different types of Language Objects and Entities are available this section.

Project-specific LOBs

When creating project-specific Language Objects, it is advised to follow the naming conventions and also to add a project prefix to the Language Object's name, for example, MYPROJECT_HEADPHONES.NN.SYN.
This is to make the project-specific Language Objects easily distinguishable from other objects available in the Teneo Lexical Resources.

Overriding TLR LOBs

Sometimes in projects, a project-specific Language Object is preferred over a Language Object coming from a lexical resource. It is not possible to force the NLU Generator to use a project-specific Language Object, but by following the below steps, the NLU Generator normally selects the project-specific object over the one in a TLR:

  • Create a local Language Object, for example, MYPROJECT_TELEPHONE.NN.SYN
  • Include the object from the lexical resource in this local object, e.g. include TELEPHONE.NN.SYN in MYPROJECT_TELEPHONE.NN.SYN
  • Add project-specific variations for the word/phrases in the local Language Object which are not represented in the object of the lexical resource
  • Create a few positive User Intents that use the project-specific words/phrases that are in the local object, alongside the User Intents that use words/phrases known to the object of the lexical resource.

Overriding TLR LOBs by using the same name

If a project-specific Language Object has the exact same name as a Language Object in a referred lexical resource (e.g. the Teneo Lexical Resource), the NLU Generator, just as Teneo Studio in general, uses the local, project-specific version.

Part-of-Speech Taggers

The NLU Generator makes use of Part-of-Speech (POS) tags in several languages. This means that the NLU Generator is capable of recognizing relevant Entities and Language Objects as well as retrieving and storing POS information.
Furthermore, it means that the NLU Generator can choose the correct Language Object in situations of disambiguation, for example, choosing a VB.LEX Language Object over a NN.LEX Language Object or the other way around.

For more information related to the Part-of-Speech Taggers and Morphological Analyzers, please see the Input Processors section.