Teneo Session Data Footprint

When a session is executed in a published engine, the progress of the session is logged. This logging is very detailed in order to enable flexibility in reporting, debugging, etc. at a later date.

While it is generally positive that so much detail is logged (anything logged can be queried in Inquire - anything not logged cannot!) there are a number of things to consider - which can affect design decisions when building a solution:

Personal Data Security (and GDPR)
Logged Session Data Size

Executive Summary?

Only store in a Global variable what needs to be in one - static/read-only data will be a lot “cheaper” if it is stored and made available via a class with static final properties and lookups.

Personal Data Security

As much of the data which passes through a deployed solution will be personal data (the end user in most cases is a person - giving specific information about themselves):

either explicitly: Hi, My name is Jude!
- user has given their first name
systematically collected: Give me directions to the Station
- system will need to know the users location to find the nearest station and then directions
or implicitly: I was looking for a new Murciélago in your Pangbourne store on Wednesday
- enough information to make a guess at who the user was

Additionally, however - since the rest of this article is concerned with awareness/limiting/controlling the data that is logged - the techniques described can also be used to limit the amount of sensitive data which is logged - this will not be specifically addressed here though.

Logged Session Data Size

For Every transaction in a session, the Engine will log:

Inputs & Outputs
- Including Input & Output Parameters
Processing Path (which triggers, flows, transitions, nodes were passed through)
Metadata Assigned
Variable Changes

Inputs & Outputs

These will be logged in full and cannot be reduced in size - unless some form of compression is built into the solution and the frontend communicating with it - this however is way out of scope of this analysis!

Input & Output Parameters

Input parameters and Output Parameters will also be logged in full, again without compression or data level encryption unless this is built into the solution and frontend (traffic should always be HTTPS so the network traffic will be encrypted). However care can still be taken in a solution to only send output parameters and only to require input parameters which are actually required for any given input / output. Repeated sending of system data which has not changed is not required will affect:

Network traffic
Session log size
PII Footprint

Processing Path

The data in the logs relating to processing path is all valuable for future debugging purposes, should not pose a data security risk and so does not need to be considered from a data footprint point of view!

Metadata Assigned

Similar to Input parameters and Output Parameters - Metadata should only be assigned when it is useful - but as a user driven action there is less chance of assigning and therefore logging unnecessary additional data. Scripted default Metadata assignments should of course be reviewed however to ensure that the data being assigned is the required data.

Variable Changes

Variable changes are automatically logged whenever the variable reference is changed during a transaction. This allows support for deeper debugging and reporting on specific types of session - without having to previously set up metadata to track.

The process of logging this data however is automatic and so there are a number of things which should be considered in the solution design to prevent excessive or sensitive data from being automatically recorded

Which variables are logged

All Flow variables and Global variables are tracked for changes and changes are then logged by the engine, eg: globalVar = "something" will be logged (assuming globalVar is a defined global variable!)
Variables defined within scripts are not tracked for changes and will not be logged, eg: def localVar = "something" will not be logged
Variable values passed between flows (via Flow Link nodes) and variable values passed to Integrations will be logged, eg:

test-flow

When a Variable is logged

A variable will be logged whenever the engine considers the value to have changed - specifically this means that the variable reference has changed. The engine will not record when something within the already referenced object has changed (that would be way too processor intensive / prone to error to be valuable)

Meaning this variable assignment will be logged:

because myVar now refers to a different object:

groovy

1myVar = "some new value"
2

This variable assignment will be logged:

because the map assigned to the variable has been changed:

groovy

1myMapVar = ["value1": "some new value"]
2

But this modification of the already reference map will not be logged:

groovy

1myMapVar["value1"] = "some other value"
2

Since the engine only requires the reference to change - it is possible to force the engine to log a variable value at a particular point (say for example you wanted to enforce the logging of the above map modification in order to reference it later via Teneo Query Language (TQL)). This can be achieved by re-referencing the same value:

groovy

1// Force session logging of myVar value
2myVar = myVar
3

This will the be logged because the reference has been changed. It still refers to the same object, but it is a new reference to the same object.

Design Considerations

'Static' Data Static data is data which is defined in the solution, and required for the functionality of the solution - but does not change during a session. For example a map of product names to product categories, or opening hours per shop.

Global String Variable A seemingly simple way to store this data is in a global variable containing a parse-able string:

LongStringStaticVariable

Which can then be parsed as JSON:

groovy

1def cityNameToFind = "Grenoble"
2new groovy.json.JsonSlurper().parseText(sCitiesInFrance).find { it.name == cityNameToFind }.geonameid
3

This seems clear - is straightforward in terms of downloading a resource from somewhere and storing it, however it has a number of issues:

The Editor for variables is not optimized for data this size - having a max width and limited scope for editing multiple variables at the same time
If the string becomes too large Groovy will no longer parse this value - max size of a string in Groovy is 65535 code units before errors will be seen in Tryout:

04/02/2020 11:57:58 [Warning]: Script syntax error: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: Script165.groovy: 1: String too long. The given string is 66302 Unicode code units long, but only a maximum of 65535 is allowed. @ line 1, column 91. s", "subcountry": "Île-de-France"},{"cou ^ 1 error

The process of storing a string then parsing involves storing the string and the JSON object in memory - so increases resource (memory and processing) usage on the engine server
The string value is assigned to the variable when the session begins (along with all other global variables)
- Every session defines this variable and as such every session has memory assigned to store this variable
and this assignment is logged
- This means that for a 1,000 char string variable every session log file will be >1,000 chars before the first input has even begun being processed!
- Consider that this occurs for every session so in an env with only 10 sessions an hour this is 10,000 additional characters an hour, 24,000 a day, 168,000 a week...
- It might seem logical once the string has been parsed to an object to store the parsed value as a variable - but then the data is stored twice in every session! Approximately doubling the storage used by this variable alone

Global Object Variable To avoid the max string limit, the expense of parsing and the duplicate variable value - the data can be defined as an object variable definition instead. In the case of JSON formatted data this is a simple conversion based on replacing curly brackets {,} with square [,]

and adding new lines for clarity

Meaning...

groovy

1/[{"country": "France", "geonameid": 2967245, "name": "Yerres", "subcountry": "Île-de-France"},{"country": "France", "geonameid": 2967318, "name": "Wittenheim", "subcountry": "Alsace-Champagne-Ardenne-Lorraine"}...
2

... becomes...

json

1[
2    ["country": "France", "geonameid": 2967245, "name": "Yerres", "subcountry": "Île-de-France"],
3    ["country": "France", "geonameid": 2967318, "name": "Wittenheim", "subcountry": "Alsace-Champagne-Ardenne-Lorraine"],
4    ...
5

LongObjectStaticVariable

Which is more easily readable for a solution developer and is also a fully defined Groovy object. This object can then be directly referenced in a very similar way to before (but without the need to parse first):

groovy

1oCitiesInFrance.find { it.name == cityNameToFind }.geonameid
2

This resolves the parsing and max string, however it still shares some drawbacks:

The Editor for variables is not optimised for data this size - having a max width and limited scope for editing multiple variables at the same time
The object value is assigned to the variable when the session begins (along with all other global variables)
- Every session defines this variable and as such every session has memory assigned to store this variable
and this assignment is logged
- This means that for a 1,000 char object variable definition every session log file will be >1,000 chars before the first input has even begun being processed!
- Consider that this occurs for every session so in an env with only 10 sessions an hour this is 10,000 additional characters an hour, 24,000 a day, 168,000 a week...

Class with static final property If the data is best defined within the solution code itself (not imported from a file resource in the solution - or an external resource via a web request) then a class with a static final propery and methods for the data lookup is a good approach - from a data footprint and solution maintenance point of view:

static ensures that only one instance is created
final ensures that this property cannot be assigned to

groovy

1class CityLookup {
2    private static final Object _citiesInFrance = [
3    ["country": "France", "geonameid": 2967245, "name": "Yerres", "subcountry": "Île-de-France"],
4    ["country": "France", "geonameid": 2967318, "name": "Wittenheim", "subcountry": "Alsace-Champagne-Ardenne-Lorraine"],
5    ...
6}
7

Adding a lookup function to the class then ensures
- all lookups are handled in the same way
- any data format changes can easily be propagated within usages in the solution (only the data in the class and the function in the class need to change for all usages to be aligned)

groovy

1    public static Object findByName(String cityNameToFind) {
2        return _citiesInFrance.find { it.name == cityNameToFind }
3    }
4

StaticClassLookup

This definition is easier to deal with as the editor provides a lot more space and the separate window means cross referencing other areas of the solution is easier. The encapsulation in a class means the lookup can then be used throughout the solution by name:

groovy

1CityLookup.findByName(cityNameToFind).geonameid
2

This solution ensures the data is stored only once for all sessions executing in the engine instance and the data is not logged to the session logs at all, saving the memory, processor, network and storage resources associated with writing, passing around, storing and querying the additional unnecessary data.

For a very basic solution the following very rough stats were gathered. The solution was completely empty except for either the static class or the global object variable plus some code to track memory.

Static class: 04/02/2020 14:40:28 [Println]: Heap: 938578.5703125k => 956029.5234375k in 10 steps over 28889ms. 1586.45028409090908203125k per step

Object Variable: 04/02/2020 14:42:24 [Println]: Heap: 457594.234375k => 504741.1875k in 10 steps over 34514ms. 4286.08664772727275390625k per step

Memory Graphs ObjectVariableVsClassMemProfile

The actual values of memory usage are not significant (that is mostly due to garbage collection) what is significant is that over a similar period of time and the same number of sessions - the increase in memory when storing in a variable was visibly and consistently larger than when defining a static class.