NLU Generation
The Natural Language Understanding (NLU) Generation is a functionality in the Teneo Platform that allow users to automatically draft suggestions of Teneo Linguistic Modeling Language (TLML) syntax in triggers and transitions when implementing TLML Syntax Matches.
The drafted suggestion is based on the set of positive and negative examples of User Intent available in the trigger or transition, on Language Objects (LOBs) and Entities and takes into account information from Part-of-Speech (POS) taggers and NERs (when available).
Based on both positive and negative examples of User Intent, the NLU Generator:
- chooses the best TLML syntax at the very end of the process, based on a wide range of criteria, minimizing the risk that the optimal alternative gets discarded early in the process only because it doesn't seem like the best one in a very local context.
- uses the Different Match (&^) operator, enabling better User Intent coverage without having to ignore the Language Objects that stretch across words they do not match; thanks to this Engine operator, the NLU Generator do not have to choose between either more reliable Language Objects (because they match more words in an example) or good example coverage (where all relevant words are used in the TLML syntax, without any long-matches stretching over them), it can do both!
- generates many alternative syntaxes for each User Intent and then the NLU Generator waits until the very end before selecting the final syntax. Then it judges the intended scope of words and phrases in the examples, given by the resulting Language Object and Entity selection, zeroing in on the best result.
Positive User Intent examples
To take full advantages of the NLU Generator, it is recommended to add more than one positive example of User Intent, a maximum of 15 and to keep each example shorter than 35 words/tokens.
This is recommended because providing more than one positive example of User Intent allows the NLU Generator to look for synonyms and better phrases and thereby suggest better TLML syntaxes. At the same time, tests have shown that more than 15 positive examples as well as examples longer than 35 words/tokens affect the performance heavily without improving the syntax quality much when these are many and lengthy.
Please note though that a trigger or transition of course can contain more than 15 positive examples of User Intent when this is useful for running other Teneo functionalities, such as Auto-test, Suggest Ordering, or manual rendering of the TLML syntax.
The NLU Generator generates a large set of syntax suggestions for each of the provided positive examples of User Intent and will, at the very end of the process, select the syntax that covers all meaningful words (non-stopwords) in the examples by choosing the longest (covering most words, usually phrase level), the most common (shared by most examples), and the most exact (as narrow as possible) Language Objects and Entities.
Negative User Intent examples
The NLU Generator also makes use of the negative examples of User Intent provided in a trigger or transition. When providing negative examples of User Intent, the NLU Generator either discards syntaxes that match negative examples (if there are alternative syntaxes that don't) or expand the syntax with negations.
The negative examples of User Intent only have influence on the syntax generations if they match the syntax generated for the positive User Intent examples. If a negative example doesn't match the syntax, this means that everything is as it should be (the Auto-test would not fail for that trigger, for example) and the syntax would remain the same as it was before the negative example was added.
Selected LOBs and Entities
When drafting TLML syntax, the NLU Generator's algorithm uses Language Objects / Entities from lexical resources assigned to the solution, such as the Teneo NLU Ontology and Semantic Network's Lexical Resources, and any project-specific Language Objects / Entities located in the solution if these follow the naming conventions of the Teneo NLU Ontology and Semantic Networks.
The Teneo Lexical Resources (TLRs) contain different types of hierarchically structured Language Objects and the Teneo Engine has no way of discriminating between those types, i.e. for the Teneo Engine they are just "Language Objects", the NLU Generator, however, is designed to depend on this structure.
The NLU Generator's algorithm relies on all the pieces of information contained in the Language Objects' names to select the most appropriate Language Object in each context and only Language Objects of the type LEX, MIX, MUL, SYN and PHR are used in the generated TLML syntax.
Entities, on the other hand, are always preferred by the NLU Generator over any Language Object, except PHR, which is preferred over any Entity in the selection of objects for the drafted TLML syntax.
If no fitting Language Object or Entity is found, the NLU Generator will use the bare word itself in the syntax.
Further descriptions of each of the different types of Language Objects and Entities are available this section.
Project-specific LOBs
When creating project-specific Language Objects, it is advised to follow the naming conventions and also to add a project prefix to the Language Object's name, for example, MYPROJECT_HEADPHONES.NN.SYN.
This is to make the project-specific Language Objects easily distinguishable from other objects available in the Teneo Lexical Resources.
Overriding TLR LOBs
Sometimes in projects, a project-specific Language Object is preferred over a Language Object coming from a lexical resource. It is not possible to force the NLU Generator to use a project-specific Language Object, but by following the below steps, the NLU Generator normally selects the project-specific object over the one in a TLR:
- Create a local Language Object, for example, MYPROJECT_TELEPHONE.NN.SYN
- Include the object from the lexical resource in this local object, e.g. include TELEPHONE.NN.SYN in MYPROJECT_TELEPHONE.NN.SYN
- Add project-specific variations for the word/phrases in the local Language Object which are not represented in the object of the lexical resource
- Create a few positive User Intents that use the project-specific words/phrases that are in the local object, alongside the User Intents that use words/phrases known to the object of the lexical resource.
Overriding TLR LOBs by using the same name
If a project-specific Language Object has the exact same name as a Language Object in a referred lexical resource (e.g. the Teneo Lexical Resource), the NLU Generator, just as Teneo Studio in general, uses the local, project-specific version.
Part-of-Speech Taggers
The NLU Generator makes use of Part-of-Speech (POS) tags in several languages. This means that the NLU Generator is capable of recognizing relevant Entities and Language Objects as well as retrieving and storing POS information.
Furthermore, it means that the NLU Generator can choose the correct Language Object in situations of disambiguation, for example, choosing a VB.LEX Language Object over a NN.LEX Language Object or the other way around.
For more information related to the Part-of-Speech Taggers and Morphological Analyzers, please see the Input Processors section.