Chinese Input Processors Chain
Introduction
An Input Processor (IP) pre-processes inputs for the Teneo Engine to be able to perform different processes on them, such as normalization and tokenization for example. Each language supported by the Teneo Platform has a chain of Input Processors that know how to process that particular language.
Input Processors Chain setup
The following graph displays the Input Processors chain for Chinese:
The Input Processors are listed below with a short description of the Input Processor's functionality, the follow sections will go into further details.
- The Chinese Tokenizer IP first converts the user input to Simplified characters and then splits it into words and sentence.
- The Chinese Annotator IP performs a morphological analysis on the user input sentences and words and annotates them to provide morphological information in addition to what the Tokenizer provides as words and their Part-of-Speech (POS) tags.
- The Chinese Numbers IP identifies and annotates the numbers present in the user input to make it easier for the final user to write syntaxes that depend on numbers.
- The System Annotation IP sets a number of annotations, based on properties of the user input text.
- The Language Detector IP identifies the language of the input sentence provided and annotates it with the predicted language and associates a confidence score of the prediction.
- The Predict IP classifies user input based on a machine learning model trained in Teneo Learn and annotates the user input with the predicted top intent classes and a confidence score.
Chinese Simplifier
The Chinese Simplifier is a special kind of processor that is used to normalize the user input by:
- converting full width Latin letters and Arabic digits into their half width version, and
- lowercasing the uppercased Latin letters.
This Simplifier is special because it is not run as part of the Input Processor chain, but rather by the Tokenizer when it puts the tokens into a Teneo data structure.
Additionally, the Simplifier is also run by the condition parser inside Teneo Engine, which normalizes the Language Object syntax words before adding them to the internal Engine dictionary.
Chinese Tokenizer IP
The Chinese Tokenizer Input Processor is the first of the input processors to be run on Chinese user inputs; it essentially does two things: first, it converts traditional Mandarin Chinese characters into simplified, and secondly, it tokenizes the converted user input and generates sentences based on the tokens.
Traditional-to-simplified conversion
The conversion of traditional characters into simplified characters is done via a one-to-one characters mapping. This mapping is configured via two properties of the Chinese Tokenizer IP:
- A list of characters:
traditionalCharacters.file.name
- The mappings of traditional characters to simplified characters:
traditionalSimplifiedMappings.file.name
.
After the conversion to simplified Mandarin Chinese, the user input is segmented into words and sentences.
Chinese tokenization
The Chinese Tokenizer splits the user input words via a statistical model and a user dictionary. The words specified in the user dictionary are guaranteed to be segmented as such by the Tokenizer.
The user dictionary has a static component, which is specified as a configuration file via the property dictionary
, and a dynamic component, which is collected from the language objects defined in a user solution that have syntaxes of type DICTEXT_word_POStag
.
The Chinese Tokenizer passes the Part-of-Speech (POS) tags generated by the Chinese Tokenizer to the user as annotations by mapping them according to a configuration file that maps Panda generated POS tags to annotations, e.g. NN=NN.POS
.
The Tokenizer also uses a configuration property called nonWordTokens
to specify which characters should not be output as tokens, e.g. punctuation, brackets, etc.
Name | Type | Required | Default |
---|---|---|---|
nonWordTokens | string | no | ""“” 『』'「」()[]{}()〔〕[]{}〈〉《》!!??…,,、。.;;::. |
The last step in the tokenization process is the splitting of the user input tokens into sentences. For this, the Tokenizer uses another configuration property called sentenceDelimiters
to know which characters mark sentence boundaries.
Name | Type | Required | Default |
---|---|---|---|
sentenceDelimiters | string | no | 。!?…!?.. |
The Chinese Tokenizer do not split decimal numbers around the decimal markers but rather concatenates the split tokens into one; this makes it easier to identify and annotate decimal numbers later in the processing chain.
Note that Numbers with a factor other than 万 or 亿 after the decimal point are not numbers and therefore are being split instead of being concatenated together. This is a change from the behavior in versions pre Teneo 6.
Chinese Annotator IP
The Chinese Annotator Input Processor includes a range of analyzers which treat specific morphological phenomena of Chinese. In general, three operations can be performed by the morphological analyzers:
- Annotation (addition of one or more morphological annotations)
- Change of the base form property
- Concatenation of multiple tokens.
The morphological analyzers are applied in a fixed order; the table below shows the current sequence of analyzers, along with the operations that are performed by them. In the following sections more details are provided for each individual analyzer.
Analyzer | Example | Annotation | Base form change | Concatenation | |
---|---|---|---|---|---|
1. | VNotV Analyzer | 是-不-是 | Yes | Yes | Yes |
2. | Verb Analyzer | 吃-完,跑-上 | Yes | Yes | Yes |
3. | Reduplication Analyzer | 红-红 | Yes | Yes | Yes |
4. | Loc Analyzer | 桌子-上 | Yes | Yes | Yes |
5. | Aspect Analyzer | 吃 了, 坐 着 | Yes | No | No |
6. | Negation Analyzer | 不 吃 | Yes | No | No |
7. | SC Analyzer | 洗了一个澡 | Yes | Yes | No |
8. | Affix Analyzer | 我们, 标准化 | Yes | Yes | No |
VNotV Analyzer
The VNotV Analyzer concatenate and analyses V-Not-V sequences.
In V-Not-V structures, the same verb occurs twice with a negation word (不, 没, 否) between the two occurrences:
- 你 去-不-去 买 东西?
Nǐ qù-bù-qù mǎi dōngxī?
You go-VNOTV.BU-go buy things
‘Do you go shopping?’
The V-Not-V structure has two uses:
- 我 不 知道 他 去-不-去 买 东西。
Wǒ bù zhīhào tā qù-bù-qù mǎi dōngxī.
I NEG know he go-VNOTV.BU-go buy things
‘I don’t know whether he goes shopping.’
In the case of bi-syllabic words, the second syllable of the first verb might be deleted:
- 你 喜(欢)-不-喜欢 买 东西 ?
Nǐ xǐ(huān)-bù-xǐhuān mǎi dōngxī?
You like-VNOTV.BU-like buy things
‘Do you like shopping?’
The VNotV Analyzer concatenates the three tokens. it assigns one structural annotation (prefixed with VNOTV
) signaling the negation form. The base form of the resulting token is set to the full form of the verb. Thus, the example 3 above WITH second syllable deletion is analyzed as follows:
- The three tokens are concatenated into one word 喜不喜欢
- This word gets the base form 喜欢
- Additionally, it gets the annotation VNOTV.BU.
VerbAnalyzer
The Verb Analyzer performs analysis of resultative and directional compounds. Resultative compounds consist of one main verb and one resultative suffix:
- 小王 吃-完 了。
Xiǎowáng chī-wán le.
Xiaowang eat-RESULT ASPECT
‘Xiaowang finished eating.’
Directional compounds consist of one main verb and one or two directional suffixes:
-
阿明 跑-上 楼机 了。
Āmíng pǎo-shàng lóujī le.
Aming run-DIR.NONDEICTIC.SHANG stairs ASPECT
‘Aming ran up the stairs.’ -
阿明 跑-上-去 了。
Āmíng pǎo-shàng-qù le.
Aming run-DIR.NONDEICTIC.SHANG-DIR.DEICTIC.QU ASPECT
‘Aming ran up.’
The combination of the main verb with the resultative/directional complements is concatenated. The base form of the resulting token is changed to the base form of the main verb. The token is assigned the annotations associated with the resultative/directional suffixes.
Annotations for resultative suffixes carry the prefix RESULT
. Annotations for directional suffixes carry the prefix DIR
. Additionally, we distinguish between deictic (DIR_DEICTIC
…) and non-deictic (DIR_NONDEICTIC
…) directional complements. Cases with two directional suffixes are limited to a non-deictic complement followed by deictic complement.
ReduplicationAnalyzer
The Reduplication Analyzer analyzes reduplications of verbs, adjectives and adverbs:
-
a. 红-红
hóng-hóng
red-red
‘very red’b. 讨论-讨论
tǎolùn-tǎolùn
discuss-discuss
‘to discuss a little’
Reduplication of adjectives manifests some variability in the distribution of the syllables. Specifically, some adjectives expose the following asymmetric reduplication patterns:
-
a. AABB:
干净 干-干-净-净
gānjìng gān-gān-jìng-jìng
clean clean-clean
‘clean very clean’b. ABB:
雪白 雪白白
xuébǎi xué-bǎi-bǎi
white white-white
‘white very white’c. AAB:
逛街 逛逛街
guàngjiē guàng-guàng-jiē
walk street walk-walk-street
‘walk street go window shopping’
In verbal reduplication, the particles 一 and 了 can occur between the two copies:
- 看-一-看
kàn-yī-kàn
look-one-look
‘to take a look’
If the two words are segmented in the original tokenization, they are concatenated by the Reduplication Analyzer. The reduplicated word gets the annotation REDUP
.
Loc Analyzer
The Loc Analyzer performs concatenation and analysis of noun + localizer combinations; localizers follow nouns and ‘transform’ them into locative nouns:
- 桌子-上
table-LOC.ON.SHANG
‘on the table’
Localizers form a closed set; the table below shows the mapping from localizers to their annotations.
Form of localizer | Annotation |
---|---|
上 | LOC_ON_SHANG |
下 | LOC_UNDER_XIA |
里 | LOC_INSIDE_LI |
内 | LOC_INSIDE_NEI |
外 | LOC_OUTSIDE_WAI |
前 | LOC_BEFORE_QIAN |
后 | LOC_BEHIND_HOU |
旁 | LOC_NEXTTO_PANG |
中 | LOC_IN_ZHONG |
The Loc Analyzer concatenates the noun + localizer combination into one word and assigns it an annotation with the label of the localizer. The base form of the resulting token is set to the base form of the noun. Thus, example 10 is analyzed as follows:
- The two tokens are concatenated into one word 桌子上.
- This word gets the base form 桌子.
Additionally, it is assigned the annotation LOC_ON_SHANG
.
Aspect Analyzer
The Aspect Analyzer analyzes aspect markers. Chinese has both pre-verbal and post-verbal aspect markers:
-
a. 她 正在 吃。
Tā zhèngzài chī.
she ASPECT eat.
‘She is eating.’b. 她 吃 了。
Tā chī le.
she eat ASPECT
‘She ate.’
Marker | Aspect | Annotation | Position |
---|---|---|---|
了 | Perfective | ASPECT_PERFECTIVE_LE | Postverbal |
着 | Progressive | ASPECT_PROGRESSIVE_ZHE | Postverbal |
过 | Experiential | ASPECT_EXPERIENTIAL_GUO | Postverbal |
在 | Progressive | ASPECT_PREVERBAL_PROGRESSIVE_ZAI | Preverbal |
正在 | Progressive | ASPECT_PREVERBAL_PROGRESSIVE_ZHENGZAI | Preverbal |
The set of aspect markers that are analyzed by the Aspect Analyzer are displayed in the above table.
The Aspect Analyzer attaches the respective annotation of the aspect marker to the main verb.
Negation Analyzer
The Negation Analyzer analyses negations of adverbs, verbs and adjectives. In these cases, the negation particle can immediately precede the negated word:
-
a. 我 没 去。
I NEG.MEI go
‘I didn’t go’b. 不 容易
NEG.BU easy
‘not easy’c. 不 太
NEG.BU too
‘not too’
The negation particle can also be separated from the verb by additional material:
- 别 这么 做。
Bié zhème zuò.
NEG.BIE this do
‘Don’t do this.’
The set of currently analyzed negation words is shown in the below table.
Form of negator | Annotation |
---|---|
不 | NEG_BU |
否 | NEG_FOU |
没 | NEG_MEI, ASPECT_PERFECTIVE |
没有 | NEG_MEIYOU, ASPECT_PERFECTIVE |
别 | NEG_BIE, MODE_IMPERATIVE |
不太 | NEG_BUTAI |
并不 | NEG_BINGBU |
不怎么 | NEG_BUZENME |
Three of the negation particles (没, 没有, 别) have two annotations. Their second annotation contains aspectual or mode information that is implied by the particle. The NegationAnalyzer attaches an annotation to the negated word. It contains the corresponding annotation of the negation particle as well as its index in the sentence. An additional annotation is attached to the negated word if the negation particle carries aspect or mode information.
For example, in example 12. a (further above), the verb 去 is annotated with two annotations, {‘NEG_MEI’, 1}
and ASPECT_PERFECTIVE
.
SC Analyzer
The SC Analyzer analyzes splitable compounds; the splitable verb-object compounds (SCs) are verb-object combinations with an idiomatic meaning, e.g. 担-心 (worry+heart = ‘to worry’), 生-气 (create+air = ‘to get angry’), 见-面 (see+face = ‘to meet so’). They allow for various kinds of syntactic activity between verb and object, e.g. insertion of aspect markers, additional objects, demonstratives, etc.:
-
a. Aspect marker:
我们 见- 了 -面
we see- ASPECT -face
‘We met.’b. Additional object:
帮- 她 一个 -忙
help- she one -affair
‘to help her’c. Nominal modifier:
见 他 的 面
see- he DEG -face
The set of SCs is large and diverse. Although it is difficult to exhaustively enumerate all SCs, the most common instances are captured in a list with currently 163 compounds. Once the SC Analyzer identifies a verb in a splitable compound, it goes forward in the sentence and looks for a valid CS object for this verb. While looking, it checks with each subsequent word whether the sequence following the verb is still a valid splitting sequence. If it arrives at a suitable object before the sequence becomes invalid, it attaches an annotation to the verb. This annotation carries two pieces of information: the tag of the splitable compound (SPLIT_
pinyin of compound) as well as the index of the dependent object. Further, the base form of the verb is set to the base form of the splitable compound.
Thus, in the example 14. a above, the verb 见 is annotated with the annotation {SPLIT_JIANMIAN, 3}
. Its base form is set to 见面.
Affix Analyzer
The Affix Analyzer analyzes inflectional and derivational suffixes. Chinese only has one inflectional suffix, that is the plural suffix -们, which can be attached to human nouns/pronouns:
-
a. 老师-们
teacher-PLURAL
‘the teachers’b. 我-们
me-PLURAL
‘we’
Additionally, Chinese has a set of derivational suffixes which change the part of speech of the word to which they are attached. For example, the suffix -者 is attached to verbs, and the resulting combination is a noun and denotes the actor of the base form verb:
- 使用-者
shǐ-yòng(-)zhě
use-ACTOR.ZHE
‘the user’
A suffixed word gets the corresponding annotation of its suffix, and the base form of the word is changed to the base form without the suffix. Thus, 使用者 in example 16 is analyzed as follows:
- 使用者 gets the annotation
ACTOR_ZHE
- 使用者gets the base form 使用.
The below table displays the set of tags used by the Affix Analyzer.
Form of affix | Annotation | Example |
---|---|---|
-于 | COMPARATIVE_YU | 高于 (两米) |
-度 | PROPERTY_DU | 精确度 |
-性 | PROPERTY_XING | 流线性 |
-化 | TRANSFORM_HUA | 现代化 |
-者 | ACTOR_ZHE | 使用者 |
-师 | ACTOR_SHI | 设计师 |
-员 | ACTOR_YUAN | 操作员 |
可- | ABILITY_KE | 可上升 |
-们 | PLURAL_MEN | 老师们 |
-城 | CITY_CHENG | 北京城 |
-市 | CITY_SHI | 上海市 |
-省 | PROVINCE_SHENG | 河北省 |
-儿 | RCOLORING_ERHUA | 好玩儿 |
-于 (word contains the suffix and has a base form of at least 2 characters) | PREP_YU | 致力于 |
Chinese Numbers IP
The Chinese Numbers Recognizer Input Processor simplifies writing syntaxes against numbers and numeric expressions in Teneo Studio solutions and provides the following functionalities:
- Normalization of tokens containing numeric values into Hindu-Arabic numerals
- Creation of a
NUMBER
annotation with anumericValue
variable which as typeBigDecimal
and contains a representation of normalized numbers - Creation of an annotation with the name of the normalized number value
- Annotate inexact numbers with annotation
INEXACT
(i.e. numbers containing characters 几 or 数 or 余 or 多).
The Chinese Numbers Recognizer Input Processor leaves the tokenization unmodified and does not try to concatenate neighboring numeric expressions, nor does it split numeric parts of a token from its non-numeric parts. It will however identify and annotate tokens which contain numeric subparts, e.g. having the token “三点”, the normalized numeric value would be 3. Furthermore, it works with decimal factored numbers like 5.5万or 1.2亿 and supports fractions and formal Kanji numbers.
Numeric normalization
Numeric string normalization is done to substrings in the input string. The normalized values are used in creation of annotations, the input string itself remains unmodified. The following normalization steps are applied by the Chinese Numbers IP:
- Hindu-Arabic numerals remain unchanged
- Hanzi numerals are normalized to their Hindu-Arabic numeric value
- Mixed Hanzi/Hindu-Arabic numerals are normalized to Hindu-Arabic numerals.
Input token | Normalized Numeric Value |
---|---|
10 | 10 |
3.14 | 3.14 |
一 | 1 |
一点 | 1 |
两百 | 200 |
三百万五千 | 3005000 |
3百万5千 | 3005000 |
三百五 | 350 |
一万零一 | 10001 |
一万〇一 | 10001 |
The above table shows examples of normalization; in the last three examples it is possible to see that even more colloquial numeric expressions such as “三百五” are handled correctly.
The NUMBER annotation
The NUMBER
annotation allows for writing syntaxes on existences of numbers in user inputs, without the need to specify any number explicitly. The only thing the Teneo Studio user should do is use the NUMBER
annotation in the syntax. For example:
tlml
1%I_WANT.PHR + %$NUMBER + %PRODUCT.LIST
2
The numeric value can also be retrieved using a listener and used later in the flow. The listings below show how numeric value retrieval is done.
tlml
1%$NUMBER + PRODUCT.LIST
2
properties
1int numberAnnotIndex = (_.usedWordIndices as List)[0]
2
3def numberAnnot = _.inputAnnotations.getByName('NUMBER').find {
4
5 // be sure that the annotation points to the correct word
6 numberAnnotIndex in it.getWordIndices()
7}
8
9 // stores value in flow variable numProducts
10 numProducts = annot.getVariables()['numericValue'] as int
11
The numeric value can also be retrieved using an NLU variable:
tlml
1%I_WANT.PHR + %$NUMBER^{someVariable=lob.numericValue} + %PRODUCTS.LIST
2
The normalized number annotation
The normalized number annotation is just the numeric value of the NUMBER
annotation as an annotation itself. This allows the Teneo Studio user to write syntaxes against specific numbers, without the need to specify all the different surface variants. Thanks to the traditional-to-simplified Chinese character conversion done in the Chinese Tokenizer IP, even traditional numeric Hanzi characters match.
In the below table please find examples of normalized number annotations.
Syntax | Matching inputs |
---|---|
%$2 | '2', '两', '二', '2', ... |
%$10000 | '10000', '万', '萬', '一万', '一〇〇〇〇', '10000', ... |
%$3.14 | '3.14', '三.一四', ... |
%$350 | '350', '三百五', '三五〇', ... |
%$1234 | '1234', '1234', '一二三四', '一千两百三十四', ... |
Date and Time annotations
The TIME.DATETIME
and DATE.DATETIME
annotations are created in the Teneo Platform for numbers which could be either time or date expressions, for example 五点零零 creates annotationTIME.DATETIME
with values hour: 5 and minute: 0, or 1/2 creating the DATE.DATETIME
annotation with values month: 1, day: 2.
To read more about how to use the natively understanding and interpretation of date and time expressions in the Teneo Platform, please see here.
System Annotation IP
The System Annotation Input Processor, shared among the different languages of the Teneo Platform, performs simple analysis of the sentence text to set some annotations. The decision algorithms are configurable by various properties. Further customization is possible by sub-classing this Input Processor and overriding one or more of the methods decideBinary
, decideBrackets
, decideEmpty
, decideExclamation
, decideNonsense
, decideQuestion
, decideQuote
.
This IP works on the sentences passed in, but does not modify them.
Other considerations
Extra request parameters read by this input processor: (none) Processing options read by this input processor: (none) Annotations this input processor may generate:
- _EMPTY: the sentence text is empty
- _EXCLAMATION: the sentence text contains at least one of the characters specified with property
exclamationMarkCharacters
- _EM3: the sentence text contains three or more characters in a row of the characters specified with property
exclamationMarkCharacters
- _QUESTION: the sentence text contains at least one of the characters specified with property
questionMarkCharacters
- _QT3: the sentence text contains three or more characters in a row of the characters specified with
questionMarkCharacters
- _QUOTE: the sentence text contains at least one of the characters specified with property
quoteCharacters
- _DBLQUOTE: the sentence text contains at least one of the characters specified with property
doubleQuoteCharacters
- _BRACKETPAIR: the sentence text contains at least one matching pair of the bracket characters specified with property
bracketPairCharacters
- _NONSENSE: the sentence probably contains nonsense text as configured with properties
consonants
,nonsenseThreshold.absolute
andnonsenseThreshold.relative
- _BINARY: the sentence text only contains characters specified by properties
binaryCharacters
(at least one of them) andbinaryIgnoredCharacters
(zero or more of them).
Special System annotations
Two special annotations related not to individual inputs, but to whole dialogues, are added by the Teneo Engine itself:
- _INIT: indicates session start, i.e. the first input in a dialogue
- _TIMEOUT: indicates the continuation of a previously timed-out session/dialogue.
Several configuration properties are available for the System Annotation Input Processor; please see the details here.
Language Detector IP
The Language Detector Input Processor uses a machine learning model that predicts the language of a given input and adds an annotation of the format %${language label}.LANG
to the input as well as a confidence score of the prediction.
The Language Detector IP can predict the following 45 languages (language label in brackets):
Arabic (AR), Bulgarian (BG), Bengali (BN), Catalan (CA), Czech (CS), Danish (DA), German (DE), Greek (EL), English (EN), Esperanto (EO), Spanish (ES), Estonian (ET), Basque (EU), Persian (FA), Finnish (FI), French (FR), Hebrew (HE), Hindi (HI), Hungarian (HU), Indonesian-Malay (ID_MS), Icelandic (IS), Italian (IT), Japanese (JA), Korean (KO), Lithuanian (LT), Latvian (LV), Macedonian (MK), Dutch (NL), Norwegian (NO), Polish (PL), Portuguese (PT), Romanian (RO), Russian (RU), Slovak (SK), Slovenian (SL), Serbian-Croatian-Bosnian (SR_HR), Swedish (SV), Tamil (TA), Telugu (TE), Thai (TH), Tagalog (TL), Turkish (TR), Urdu (UR), Vietnamese (VI) and Chinese (ZH).
Serbian, Bosnian and Croatian are treated as one language, under the label SR_HR and Indonesian and Malay are treated as one language, under the label ID_MS.
A number of regexes are also in use by the Input Processor, helping the model to not predict language for fully numerical inputs, URLs or other type of nonsense inputs.
The Language Detector will provide an annotation when the confidence prediction threshold is above 0.2 for the languages, but for Arabic (AR), Bengali (BN), Greek (EL), Hebrew (HE), Hindi (HI), Japanese (JA), Korean (KO), Tamil (TA), Telugu (TE), Thai (TH), Chinese (ZH), Vietnamese (VI), Persian (FA) and Urdu (UR) language annotations will always be created, even for predictions below 2.0, since the Language Detector is mostly accurate when predicting them.
Predict IP
The Predict Input Processor makes use of an intent model generated when classes are available in a Teneo Studio solution to annotate user inputs with the defined classes; intent models can be generated either with Teneo Learn or CLU. Note that as of Teneo 7.3, deferred intent classification is applied and annotations are only created by Predict if references to class annotations are found during the input matching process.
When Predict receives a user input, confidence scores are calculated for each class based on the model and annotations created for the most confident class and for each other class that matches the following criteria:
- the confidence is above the minimum confidence (defaults to 0.01)
- the confidence is higher than 0.5 times the confidence value of the top class.
For each selected class, an annotation with the scheme <CLASS_NAME>.INTENT
is created, with the value of the model's confidence in the class as well as an annotation variable specifying the used classifier (i.e., Learn, CLU or LearnFallback) and an Order variable defining the order of the selected classes (i.e., 0 for the class with the highest confidence score and 4 for the selected class with the lowest confidence score).
A special annotation <CLASS_NAME>.TOP_INTENT
is created for the class with the highest confidence score.
Annotation | Variable | Variable | Variable | Description |
---|---|---|---|---|
<CLASS_NAME>.TOP_INTENT | classifier | confidence | Annotation created for the class with the highest confidence score | |
<CLASS_NAME>.INTENT | classifier | confidence | Order | Annotation given to each selected class with a maximum of five top classes |
The Predict Input Processor creates a maximum of 5 annotations, regardless of how many classes match the criteria. The numerical threshold can be configured in the properties file of the Input Processor.
Configuration properties
Name | Type | Required | Default |
---|---|---|---|
minConfidenceSimilarityDistance | float | no | 0.5 |
Confidence percentage of the top score confidence a class must have in order to be considered, e.g. if the top confidence class has a confidence of 0.7, classes with confidence lower than 0.5 x 0.7 = 0.35 will be discarded.
Name | Type | Required | Default |
---|---|---|---|
maxNumberOfAnnotations | int | no | 5 |
Maximum number of class annotations to create for each user input.
Name | Type | Required | Default |
---|---|---|---|
minConfidenceThreshold | float | no | 0.01 |
Minimum value of confidence a model must have for a class in order to add it as one of the candidate annotations.
Name | Type | Required | Default |
---|---|---|---|
intent.model.file.name | string (filename) | no | inexistent |
Name of the file containing the machine learning model. It is usually set automatically by Teneo Studio, so no configuration is required.