Anonymize user data
When dealing with bots, a user is in very often in a position of giving out personal information. Keeping track of what personal data may be captured by the bot is therefore part of the normal development of bots. The Pre-logging script event can be used to anonymize personal data and in a way where the functionality of the bot is not affected. With the Pre-logging script, we can:
- Remove data before it leaves Teneo Engine
- Redact, remove, or encrypt sensitive data
On this page, we will show you how to anonymize personal data with the following steps:
- Create a Global variable that stores the values to redact.
- Add a Post-processing script to capture relevant data, so that it can later be redacted.
- Add an End-dialog script to sort the log to redact in descending order of length to ensure that replacement of shorter strings doesn't break up longer strings;
- Add a Pre-logging script to redact the values and remove the variable.
The first section provides a simple use-case of anonymizing personal data, while the second section describes more advanced use-cases.
Anonymize a variable
For this example, we will replace the value in userFirstName, which stores the users first name. This variable is located inside the Longberry Baristas solution and comes with the Teneo Dialogue Resources.
- While inside your solution, click on the 'Solution' tab.
- Click on 'Globals'.
- Select the 'Scripts' tab at the top.
- Add a new 'Pre-logging' script and name it
Replace user name
. - Add the following line into the editing window:
_.getDialogHistoryUtilities().replaceVariables(['Lib_sUserFirstName'], 'John Doe')
. This will automatically replace the variable value to 'John Doe' right before its starts logging it. - Hit 'Save'.
Anonymize personal data
One other scenario where its powerful to use Pre-logging scripts is when you want to anonymize personal data. Teneo is powerful at recognizing different values of personal data thanks to the Named-entity recognizer and Part-of-speech annotation tags. In the following example, we will go ahead and redact the names mentioned while communicating with our bot, using the PERSON.NER annotation tag.
Create a global variable
As a first step, we need to create a new Global variable that will store the values we want to redact:
- In the Solution backstage, select 'Globals' followed by 'Variables'.
- Click 'Add'. A panel for specifying the new variable appears on the right-hand side.
- Name the variable
toRedact
, and set its initial value to an empty list:[]
. (Make sure to edit the "Value" field and not the "Description" field.) - Hit 'Save'.
Add a Post-processing script
Next in line is to add a Post-processing script to store the relevant values in the global variable we created in the previous step.
- Select the 'Scripts' tab at the top.
- Add a new 'Post-processing' script and name it
Find items to redact
. - Add the following groovy script into the editing window:
groovy
1// Find names that have been mentioned by the user
2_.inputAnnotations.getByName('PERSON.NER').each { person ->
3 def sentence = _.sentences[person.sentenceIndex]
4 def firstWord = sentence.words[person.wordIndices.min()];
5 def lastWord = sentence.words[person.wordIndices.max()];
6 def beginIndex = firstWord.beginIndex
7 def endIndex = lastWord.endIndex
8// Select the items to redact
9 def itemToRedact = [
10 'historyIndex': _.dialogHistoryLength,
11 'beginIndex': beginIndex,
12 'endIndex': endIndex,
13 'value': sentence.text.substring(beginIndex, endIndex)
14 ]
15 toRedact.add(itemToRedact)
16}
17
- Hit 'Save'.
Add a Pre-logging script
Finally, we will add a Pre-logging script to redact the values stored in the global variable.
- While inside the 'Scripts' tab, add a 'Pre-logging' script called
Redact names
- Add the following code into the editing window:
groovy
1// Replace the mentioned name with '*'
2toRedact.each {
3 _.dialogHistoryUtilities.redact(it.historyIndex, it.beginIndex, it.endIndex, '*' as char)
4}
5def values = [*toRedact.collect {it.value}, Lib_sUserFirstName]
6// Replace output
7_.dialogHistoryUtilities.replaceResponseText(values, '****')
8// Remove the variables
9_.dialogHistoryUtilities.removeVariables(['Lib_sUserFirstName', 'toRedact'])
10
- Hit 'Save'.
Publish and test your bot
In order to see if our scripts work, you will need to publish your bot. Proceed as follows:
- Open the 'SOLUTION' tab in the solution's window.
- Select 'Publish' in the left sidebar.
- Click the 'Manage' button and in the drop-down you will see a lot of different alternatives. Locate the 'Latest' section and choose 'Publish'.
You might see a warning saying 'Publish to 'Default env' stopped with warnings. '
This is nothing to worry about; the warning is shown when you publish your solution for the first time or when you have made certain global changes. To proceed, just check the checkbox 'Perform full application deployment on Try again' and click the 'Try again' button.
The publication may take a couple of minutes; the video below is sped up slightly. When it has finished, you'll receive a confirmation pop-up.
-
Once published, click on the blue 'Open' icon. This will open the Teneo Web Chat in a new browser tab.
-
Click on the blue icon in the bottom right corner to open up the Teneo Web Chat window.
-
Strike up a conversation with the bot, like:
Hi, my name is John Doe
Goodbye!
- Close the chat window to end the conversation.
Read the logs
Now return to your Teneo Studio and open up Log Data Source to see if the name has been redacted.
- Open the 'SOLUTION' tab in the solution's window.
- Select 'Optimization' in the left sidebar.
- Navigate to 'Log Data' and open up your source by clicking on the 'Manage' button followed by 'Open'.
A new window should now open. This is the Log Data window, described here. The next step is to open a new Session Viewer tab to retrieve the latest session.
- In the 'Session Viewer' section, click on 'New Session Viewer Tab'.
- Change the values to 'Start date' and 'Descending' to retrieve the most recent session.
You should see the following conversation. If not, please repeat the steps above.
Anonymize personally identifiable information (PII)
In the following example we will go ahead and redact the personal information mentioned while communicating with our bot, using Regex as a tool to do so.
Create global variables
We must first create a few global variables. Below find a list of all essential variables to be added. You can adjust the values to meet your requirements.
Variable Name | Description | Initial Value |
---|---|---|
pLogAnnotationsToAnonymise | Annotations which should be anonymised/pseudonymised before being written to the logs | [[ner: 'PERSON.NER', tag: '<person>'], [ner: 'EMAIL.NER', tag: '<email>'], [ner: 'ADDRESS.NER', tag: '<address>'], [ner: 'LOCATION.NER', tag: '<location>'], [ner: 'ZIP_CODE.NER', tag: '<postcode>'], [ner: 'IP.NER', tag: '<ip>']] |
pLogAnonymiseString | This string will be used to replace log data if pLogIsAnonymise = true | "XXXXXX" |
pLogIsAnonymise | If true, will anonymise PII ("xxxx"), if false will pseudonymise using tag specified in pLogAnnotationsToAnonymise variable | false |
pLogToRedact | List of contents to redact | [] |
Add a Post-processing script
Next we add a Post-processing script to store the relevant values in the global variable pLogToRedact, created in the previous step.
- Select the 'Scripts' tab at the top.
- Add a 'Post-processing' script with a name like
Find PII mentioned by user
. - Add the following groovy script into the editing window:
groovy
1// Find PII mentioned by the user
2def ii = 1;
3
4pLogAnnotationsToAnonymise.each { annotation ->
5 println "working on annotation " + ii++ + annotation;
6 _.inputAnnotations.getByName(annotation.ner).each { item ->
7 println "annotation item: " + item;
8 try {
9 def sentence = _.sentences[item.sentenceIndex];
10 println "sentence: " + sentence;
11 def firstWord = sentence.words[item.wordIndices.min()];
12 def lastWord = sentence.words[item.wordIndices.max()];
13 def beginIndex = firstWord.beginIndex;
14 def endIndex = lastWord.endIndex;
15 // Save the items to redact
16
17 def itemToRedact = [
18 'historyIndex': _.dialogHistoryLength,
19 'sentenceIndex': item.sentenceIndex,
20 'beginIndex': beginIndex,
21 'endIndex': endIndex,
22 'value': sentence.text.substring(beginIndex, endIndex),
23 'strLength': endIndex - beginIndex,
24 'tag': annotation.tag
25 ]
26 pLogToRedact.add(itemToRedact)
27 println 'Annotated ' + annotation + ": " + sentence.text.substring(beginIndex, endIndex);
28 } catch (Exception e) {
29 println "Exception! " + e;
30 println "Using annotation " + annotation.ner;
31 println firstWord;
32 println lastWord;
33 }
34 }
35}
36
- Hit 'Save'.
Add an End dialog script
The next step is to add an End dialog script to sort the global variable pLogToRedact in descending order of length. This ensures that replacement of shorter strings doesn't break up longer strings and lead to lack of redaction
- While inside the 'Scripts' tab, add an 'End dialog' script with a name like
Sort pLogToRedact
. - Add the following code into the editing window:
groovy
1if (pLogToRedact.size() > 1) {
2 //sort in descending order of length to ensure that replacement of shorter strings
3 //doesn't break up longer strings and lead to lack of redaction
4 pLogToRedact.sort { a, b ->
5 b.strLength <=> a.strLength
6 }
7}
8
- Hit 'Save'.
Add a Pre-logging script
Finally, add a Pre-logging script to redact the values stored in the global variable.
- While inside the 'Scripts' tab, add a 'Pre-logging' script and give it a name like
Redact PII values
. - Add the following code into the editing window:
groovy
1public class preloggingHandler {
2
3 public static String maskVars(String varName, ArrayList varValue) {
4
5 if (varName == 'pLogToRedact') {
6 return ['<redacted>'];
7 } else {
8 return varValue;
9 }
10 }
11}
12
13if (pLogToRedact) {
14
15 try {
16 pLogToRedact.each { pii ->
17 def replaceWith = pLogIsAnonymise ? pLogAnonymiseString : pii.tag;
18 _.getDialogHistoryUtilities().replaceUserInputText(text -> text.replaceAll(/(?i)(?:\b|^)$pii.value(?:\b|$)/, replaceWith));
19 _.getDialogHistoryUtilities().replaceResponseText(text -> text.replaceAll(/(?i)(?:\b|^)$pii.value(?:\b|$)/, replaceWith));
20 }
21 } catch (Exception e) {
22 println (e.getMessage());
23 }
24
25 // Remove request parameters
26 _.getDialogHistoryUtilities().replaceRequestParameters(['userinput'], pLogAnonymiseString);
27 _.getDialogHistoryUtilities().replaceRequestParameters(['channel'], 'anyChannel');
28
29 // Remove variables
30 try {
31 _.getDialogHistoryUtilities().replaceVariables((varName, varValue) -> (varValue instanceof String || varValue instanceof ArrayList ? preloggingHandler.maskVars(varName, varValue) : varValue));
32 } catch (Exception e) {
33 _.getDialogHistoryUtilities().replaceRequestParameters(['userinput'], e.getMessage());
34 }
35}
36
- Hit 'Save'.
Create customized annotations using Language Objects
As noted above, Teneo Studio provides you with great flexibility for customization. You can create your own rules of personal information recognition and redaction by adding customized annotations. In the following example we will use the language object TITLES.LIST to capture the title of a person, such as Mr, Mrs, etc., and generate a customized annotation via Global Pre-listener.
- Open the 'SOLUTION' tab in the solution's window.
- Select 'Globals' in the purple bar on the left-hand side, and then select 'Listeners'.
- Click 'Add' and select 'Pre listener' in the drop-down list.
- Give the listener a name, for example
Customize annotations by LO
. - Click the back arrow in the top left corner.
- Add the following condition in the TLML Syntax field:
tlml
1(%TITLES.LIST^{pLogAnnotLO = []; def tmpAnnot = [:]; tmpAnnot.put("name", 'TITLE'); tmpAnnot.put("sentenceIndex", _.sentenceIndex - 1); tmpAnnot.put("usedWordIndices", _.usedWordIndices); pLogAnnotLO << tmpAnnot})
2~
3(%TITLES.LIST^{def tmpAnnot = [:]; tmpAnnot.put("name", 'TITLE'); tmpAnnot.put("sentenceIndex", _.sentenceIndex - 1); tmpAnnot.put("usedWordIndices", _.usedWordIndices); pLogAnnotLO << tmpAnnot})
4
- Add the following code in the Execution Script field:
groovy
1pLogAnnotLO.eachWithIndex { annotation, index -> _.inputAnnotations.add(_.createInputAnnotation(annotation.name.toUpperCase(), annotation.sentenceIndex, annotation.usedWordIndices, [:]))
2}
3
4
- Save and close the listener.
- While inside the 'Global' panel, select 'Variables'.
- Add a variable called
pLogAnnotLO
with the initial value as empty list[]
. - Select the variable pLogAnnotationsToAnonymise we have created, and add
[ner: 'TITLE', tag: '<title>']
within the list. - Click 'Save All' on the left-hand side.
Create customized annotations using Regular Expressions
In addition to language objects, you can use Regular Expressions, which are more flexible in creating customized annotations. In the following example we will create a regular expression to capture the IBAN (International Bank Account Number) and generate a customized annotation via Global pre-listener as well.
- Download and import the RegAnnotHelper.groovy file into your solution following this guide.
- Repeat the step 1-5 of the last session: Create customized annotations by Language Object to create another global pre-listener and give it a different name, such as
Customize annotations by Regex
. - Add
%TRUE.SCRIPT
(or{true}
, if the language object TRUE.SCRIPT cannot be found) to the Condition field, which will allow this listener to be triggered with any input. - Add the following code in the Execution Script field:
RegAnnotHelper.annotateAnchoredRegex(_, 'IBAN',/\b[A-Za-z]{2}[0-9]{2}(?:[ ]?[0-9]{4}){4,5}/)
- Save and close the listener.
- Repeat the step 9, 11 and 12 of the last session to add
[ner: 'IBAN', tag: '<iban>']
to the global variable pLogAnnotationsToAnonymise.
Make sure you return to Tryout and reload the engine before continuing on the next step
Publish and test your bot
In order to see if our scripts work, you will need to publish your bot. Proceed as follows:
- Open the 'SOLUTION' tab in the solution's window.
- Select 'Publish' in the left sidebar.
- Click the 'Manage' button and in the drop-down you will see a lot of different alternatives. Locate the 'Latest' section and choose 'Publish'.
You might see a warning saying 'Publish to 'Default env' stopped with warnings. '
This is nothing to worry about; the warning is shown when you publish your solution for the first time or when you have made certain global changes. To proceed, just check the checkbox 'Perform full application deployment on Try again' and click the 'Try again' button.
The publication may take a couple of minutes; the video below is sped up slightly. When it has finished, you'll receive a confirmation pop-up.
-
Once published, click on the blue 'Open' icon. This will open the Teneo Web Chat in a new browser tab.
-
Click on the blue icon in the bottom right corner to open up the Teneo Web Chat window.
-
Strike up a conversation with the bot, like:
Hi, my name is John Doe
Please call me Doctor
I live in 585 Gran Via, Barcelona
My bank account is ES7921000813610123456789
Goodbye!
- Close the chat window to end the conversation.
Read the logs
Now return to your Teneo Studio and open up Log Data Source to see if the name has been redacted.
- Open the 'SOLUTION' tab in the solution's window.
- Select 'Optimization' in the left sidebar.
- Navigate to 'Log Data' and open up your source by clicking on the 'Manage' button followed by 'Open'.
A new window should now open. This is the Log Data window, described here. The next step is to open a new Session Viewer tab to retrieve the latest session. 4. In the 'Session Viewer' section, click on 'New Session Viewer Tab'. 5. Change the values to 'Start date' and 'Descending' to retrieve the most recent session.
You should see the following conversation. If not, please repeat the steps above.