NLU generator
Draft TLML Syntax suggestion
The NLU Generator is a functionality in the Teneo Platform that allows users to automatically draft TLML Syntax suggestions for triggers and conditional transitions when implementing Match TLML Syntax elements.
The drafted TLML Syntax suggestion is based on the set of positive and negative User Intent examples available in the trigger or transition; on Language Objects (LOBs) and Entities, and takes into account information from Part-of-Speech (POS) taggers and NERs (when available).
The TLML Syntax suggestions can be created within a Match TLML Syntax, by selecting the Draft option available under Advanced Options. If the user updates the examples of User Intent available it is possible to click Draft again to get a new suggestion.
Based on both positive and negative User Intent examples, the NLU Generator:
- chooses the best TLML Syntax at the very end of the process, based on a wide range of criteria, minimizing the risk that the optimal alternative gets discarded early in the process only because it doesn't seem like the best one in a very local context.
- uses the Different Match (&^) operator, enabling better User Intent example coverage without having to ignore the Language Objects that stretch across words they do not match. Thanks to this engine operator, the NLU Generator do not have to choose between either more reliable Language Objects (because they match more words in an example) or good example coverage (where all relevant words are used in the TLML Syntax, without any long-matches stretching over them), it can do both!
- generates many alternative syntaxes for each User Intent example, and then the NLU Generator waits until the very end before selecting the final syntax. Then it judges the intended scope of words and phrases in the User Intents, given by the resulting Language Object and Entity selection, zeroing in on the best result.
Positive User Intents
To take full advantages of the NLU Generator, it is recommended to add more than one positive example of User Intent and a maximum of 15 as tests have shown that using more than 15 positive User Intent examples affects the performance heavily without improving the syntax quality much.
Also, the positive User Intents should not be more than 35 words/tokens long, as performance is affected when examples are lengthy and many.
Note that a trigger or transition of course can contain more than 15 positive examples of User Intent, as this might be useful for running Auto-test, Suggest ordering or for manual rendering of the TLML Syntax.
By using more than one positive example, the NLU Generator can suggest better TLML Syntaxes as it will also look for synonyms and better phrases.
The NLU Generator generates a large set of syntax suggestions for each of the positive User Intents and will, at the very end of the process, select the syntax that covers all meaningful words (non-stopwords) in the examples by choosing the longest (covering most words, usually phrase level), the most common (shared by most examples), and the most exact (as narrow as possible) Language Objects and Entities.
Negative User Intents
The NLU Generator also makes use of the negative examples of User Intent provided in a trigger or transition. When providing negative User Intents, the NLU Generator will either discard syntaxes that match negative examples (if there are alternative syntaxes that don't) or expand the syntax with negations.
The negative examples only have influence on the syntax generation if they match the syntax generated based on the positive User Intents. If a negative User Intent doesn't match the syntax, this means that everything is as it should be (the Auto-test would not fail for that trigger, for example) and the syntax would remain the same as it was before the negative example was added.
Selected LOBs and Entities
Lexical resources
When drafting TLML Syntax, the NLU Generator's algorithm uses Language Objects / Entities from lexical resources assigned to the solution, such as the Teneo NLU Ontology and Semantic Networks, and any project-specific Language Objects / Entities located in the solution if these follow the naming conventions of the Teneo NLU Ontology and Semantic Networks.
The Teneo NLU Ontology and Semantic Network's Lexical Resources contain different types of hierarchically structured Language Objects, and Engine has no way of discriminating between those types, i.e. for the Teneo Engine, they are just "Language Objects"; the NLU Generator, however, is designed to depend on this structure.
The NLU Generator's algorithm relies on all the pieces of information contained in the Language Objects' names to select the most appropriate Language Object in each context and only Language Objects of the type LEX, MIX, MUL, SYN and PHR are used in the generated TLML syntax. Entities, on the other hand, are always preferred by the NLU Generator over any Language Object, except PHR, which is preferred over any Entity in the selection of objects for the drafted TLML Syntax.
If no fitting Language Object or Entity is found, the NLU Generator will use the bare word itself in the syntax.
Further descriptions of each of the different types of Language Objects and Entities are available in this section.
Project-specific LOBs
When creating project-specific Language Objects, it is advised to follow the naming conventions and also to add a project prefix to the Language Object's name:
tlml
1MYPROJECT_HEADPHONES.NN.SYN
2
This is to make the project-specific Language Objects easily distinguishable from the objects available in the Teneo Lexical Resources.
Overriding TLR LOBs
Sometimes in projects, a project-specific Language Object is preferred over a Language Object coming from a lexical resource. It is not possible to force the NLU Generator to use a project-specific Language Object, but by following the below steps, the NLU Generator normally selects the project-specific object over the one in the TLR.
- Create a local Language Object, for example,
MYPROJECT_TELEPHONE.NN.SYN
- Include the object from the lexical resource in this local object, e.g. include
TELEPHONE.NN.SYN
inMYPROJECT_TELEPHONE.NN.SYN
- Add project-specific variations for the word/phrases in the local Language Object which are not represented in the object of the lexical resource
- Create a few positive examples of User Intent that use the project-specific words/phrases that are in the local object, alongside the examples that use words/phrases known to the object of the lexical resource.
Overriding TLR LOBs by using the same name
If a project-specific Language Object has the exact same name as a Language Object in a referred lexical resource (e.g. the Teneo Lexical Resource), the NLU Generator, just as Teneo Studio in general, uses the local, project-specific, version.
Part-of-Speech Taggers
The NLU Generator makes use of Part-of-Speech (POS) tags in several languages. This means that the NLU Generator is capable of recognizing relevant Entities and Language Objects as well as retrieving and storing POS information.
Furthermore, it means that the NLU Generator can choose the correct Language Object in situations of disambiguation, for example, choosing a VB.LEX
Language Object over a NN.LEX
Language Object or the other way around.
For more information related to the Part-of-Speech Taggers and Morphological Analyzers, please see the Input Processors section.