Teneo Developers

Data footprint

Designing our bot to have an effective data footprint is a good idea from many perspectives. It lets us query the conversational log data much faster, puts a number of good practices into focus, and it makes us conscious of what we're storing in the conversational log data, which helps from a privacy perspective.

Frame 16

Designed for Conversational Data

Teneo Inquire — which is the analytics and data part of the Teneo platform — along with all of Teneo, is built to perform well on conversational data. Teneo is not designed to store large chunks of binary data or large and unique JSON structures.

Typical conversational data sessions in the Teneo platform are normally in the range of 50 kb to 400 kb. Very large sessions, e.g. really long sessions with many turns of dialog, can be slightly larger at 400 kb to 800 kb.

Teneo has a limit of 1 mb per session.

Teneo Inquire scales typical conversational data well with traffic, meaning it scales well in regards to API calls per session.

Teneo Inquire scales less well with data outside of its purpose, including:

  • Conversational log data, which includes a large number of non-conversational data, such as big JSON payloads or large binary objects.
  • Very large conversational log data sessions which are outside of the normal spans. This is often an indicator that the bot includes large chunks of non-conversational data.

Solution data footprint is key

An effective solution data footprint is a key indicator of a well-designed bot. It also greatly impacts the performance of Teneo Inquire, which affects how fast we can query our conversational log data and how quickly Teneo Studio is able to give us feedback in e.g. the Optimization section.

Estimating the session size

The size of a session can be estimated from the size of the dialog history converted into a string. You can therefore estimate the session size in kb by using the following example snippet

groovy

1int sessionSize = (int) (engineAccess.dialogHistory.toString().getBytes().size() / 1000)
2println("Session size ${sessionSize} kb")
3

This gives you a sense of how large your session logs are.

This is recommended to be placed as a End dialog global script and not anywhere else to reduce data footprint.

Store session size in Variables

Each session size can be retrieved and stored in a global variable, the script to retrieve this varies depending on if you want to access it to for your Development (Dev) and Quality Assurance / Staging (QA), or Production (Prod) environment.

Note that session size is given in kb.

Here is how to do that:

  1. Navigate to your solution backstage, followed up with 'Globals' and Variables.
  2. Create a new Global Variable called sessionSize and give it the value 0.
  3. Save and navigate over to 'Scripts'.
  4. Add a new 'End dialog' script and use the following snippet:

Only in Tryout, Development (Dev) and Quality Assurance / Staging (QA)

groovy

1def releaseEnvironment = engineAccess.getProperty('servletContextParameters.release_environment')
2if("production" != releaseEnvironment) {
3    sessionSize = (int) (engineAccess.dialogHistory.toString().getBytes().size() / 1000)
4}
5

For Production (Prod) - and all environments

groovy

1sessionSize = (int) (engineAccess.dialogHistory.toString().getBytes().size() / 1000)
2

Please be aware that the changes will not be live until you Publish your solution.

Retrieve Session size with TQL

You can then retrieve the session sizes with Teneo Query Language (TQL) using one of the following queries:

QueryDescription
la avg s.sv:n:sessionSizeGet an average view of the sessions
la s.sv:n:sessionSize as 'sessionSize' order by sessionSize descList all the sessions in decreasing order
la s.id, s.sv:n:sessionSize : s.sv:n:sessionSize >= 500List all sessions with over 0.5 mb size
ca s.sv:n:sessionSize : s.sv:n:sessionSize >= 500Count the number of sessions with over 0.5 mb size

Good practices

Here are some good practices when working on your Data Footprint.

Solution design

  • Avoid sending in large 'blobs' or strings representing objects of data as these are very costly. Instead, use integrations to call web services when you need to retrieve this data.

Teneo Inquire

  • Use Adorners! Adorners can be used to copy variables from event level to session level, which means that they will be faster to query for. You can read more about Adorners in the documentation and here in the Developers pages.
  • Use Aggregators! Aggregators are used to aggregate data, for example the amount of traffic towards the bots' key flows. These are incredibly fast to query against, and can be used to power dashboards. You can read more about Aggregators in the documentation and here in the Developers pages.
  • Use Sample! When your bot is successful and your datasets grow larger, TQL queries will take longer to run. To quickly design queries, you can use the sample command to ask Teneo to run your query over a small subset of sessions and return results. Read more about sampling in the TQL Reference.
    • You can also use limit, but if the number of hits is small or if a mistake is made in the TQL query, you may end up waiting a long time for no results. Read more about limit in the TQL Reference and in the TQL Reference Guide in the Developers pages.
  • When you are working on reporting and analytics, it's a good idea to work on your Teneo Query Language queries in Teneo Studio as you have much more support there. However, it is recommended to use the Teneo Inquire Client to run long-running queries.
    • Teneo Studio also gives you the possibility of sharing queries, which is a perfect way to save commonly used queries.
    • You can publish queries, which are then easy to retrieve using the Teneo Inquire Client.
  • Do not wait to set up efficient reporting - do it already in sprint 1 and extend it as you go. This will make sure things are done right from the beginning.

Further reading

Further reading can be found in the Forum, where you can ask questions to a Teneo Developer.