Data footprint
Designing our bot to have an effective data footprint is a good idea from many perspectives. It lets us query the conversational log data much faster, puts a number of good practices into focus, and it makes us conscious of what we're storing in the conversational log data, which helps from a privacy perspective.
Designed for Conversational Data
Teneo Inquire — which is the analytics and data part of the Teneo platform — along with all of Teneo, is built to perform well on conversational data. Teneo is not designed to store large chunks of binary data or large and unique JSON structures.
Typical conversational data sessions in the Teneo platform are normally in the range of 50 kb to 400 kb. Very large sessions, e.g. really long sessions with many turns of dialog, can be slightly larger at 400 kb to 800 kb.
Teneo Inquire scales typical conversational data well with traffic, meaning it scales well in regards to API calls per session.
Teneo Inquire scales less well with data outside of its purpose, including:
- Conversational log data, which includes a large number of non-conversational data, such as big JSON payloads or large binary objects.
- Very large conversational log data sessions which are outside of the normal spans. This is often an indicator that the bot includes large chunks of non-conversational data.
Solution data footprint is key
An effective solution data footprint is a key indicator of a well-designed bot. It also greatly impacts the performance of Teneo Inquire, which affects how fast we can query our conversational log data and how quickly Teneo Studio is able to give us feedback in e.g. the Optimization section.
Estimating the session size
The size of a session can be estimated from the size of the dialog history converted into a string. You can therefore estimate the session size in kb by using the following example snippet
groovy
1int sessionSize = (int) (engineAccess.dialogHistory.toString().getBytes().size() / 1000)
2println("Session size ${sessionSize} kb")
3
This gives you a sense of how large your session logs are.
Store session size in Variables
Each session size can be retrieved and stored in a global variable, the script to retrieve this varies depending on if you want to access it to for your Development (Dev) and Quality Assurance / Staging (QA), or Production (Prod) environment.
Here is how to do that:
- Navigate to your solution backstage, followed up with 'Globals' and Variables.
- Create a new Global Variable called
sessionSize
and give it the value0
. - Save and navigate over to 'Scripts'.
- Add a new 'End dialog' script and use the following snippet:
Only in Tryout, Development (Dev) and Quality Assurance / Staging (QA)
groovy
1def releaseEnvironment = engineAccess.getProperty('servletContextParameters.release_environment')
2if("production" != releaseEnvironment) {
3 sessionSize = (int) (engineAccess.dialogHistory.toString().getBytes().size() / 1000)
4}
5
For Production (Prod) - and all environments
groovy
1sessionSize = (int) (engineAccess.dialogHistory.toString().getBytes().size() / 1000)
2
Retrieve Session size with TQL
You can then retrieve the session sizes with Teneo Query Language (TQL) using one of the following queries:
Query | Description |
---|---|
la avg s.sv:n:sessionSize | Get an average view of the sessions |
la s.sv:n:sessionSize as 'sessionSize' order by sessionSize desc | List all the sessions in decreasing order |
la s.id, s.sv:n:sessionSize : s.sv:n:sessionSize >= 500 | List all sessions with over 0.5 mb size |
ca s.sv:n:sessionSize : s.sv:n:sessionSize >= 500 | Count the number of sessions with over 0.5 mb size |
Good practices
Here are some good practices when working on your Data Footprint.
Solution design
- Avoid sending in large 'blobs' or strings representing objects of data as these are very costly. Instead, use integrations to call web services when you need to retrieve this data.
Teneo Inquire
- Use Adorners! Adorners can be used to copy variables from event level to session level, which means that they will be faster to query for. You can read more about Adorners in the documentation and here in the Developers pages.
- Use Aggregators! Aggregators are used to aggregate data, for example the amount of traffic towards the bots' key flows. These are incredibly fast to query against, and can be used to power dashboards. You can read more about Aggregators in the documentation and here in the Developers pages.
- Use Sample! When your bot is successful and your datasets grow larger, TQL queries will take longer to run. To quickly design queries, you can use the
sample
command to ask Teneo to run your query over a small subset of sessions and return results. Read more about sampling in the TQL Reference.- You can also use
limit
, but if the number of hits is small or if a mistake is made in the TQL query, you may end up waiting a long time for no results. Read more about limit in the TQL Reference and in the TQL Reference Guide in the Developers pages.
- You can also use
- When you are working on reporting and analytics, it's a good idea to work on your Teneo Query Language queries in Teneo Studio as you have much more support there. However, it is recommended to use the Teneo Inquire Client to run long-running queries.
- Teneo Studio also gives you the possibility of sharing queries, which is a perfect way to save commonly used queries.
- You can publish queries, which are then easy to retrieve using the Teneo Inquire Client.
- Do not wait to set up efficient reporting - do it already in sprint 1 and extend it as you go. This will make sure things are done right from the beginning.
Further reading
Further reading can be found in the Forum, where you can ask questions to a Teneo Developer.