Log Data Handling
In a published environment, the Teneo Engine logs a user's interaction with the published solution; these logs can later be accesses, queried and analyzed through the Log Data Source in Teneo Studio.
Before logs are available to a Teneo Studio user, these are transported and stored by different elements of the Teneo Platform.
Very briefly:
- when transported, logs are kept together at session level, that is, the "transport log unit" is the session
- once a session is completed, all its logs are immediately sent to the Message Queue (MQ), which moves them to their corresponding Log Archive and Log Data Source, already set in Cassandra and Elasticsearch
- logs are then accessible from the Log Data Source set in Elasticsearch; each Log Data Source is configured to contain the logs belonging to a specific period.
Note that while the Message Queue always moves all logs to the Log Archive, logs are only moved to a Log Data Source if the configuration allows it, that is, if the transported logs belong to the time span of any of the available Log Data Sources.
Message Queue
The Message Queue is, in a nutshell, a predefined and productized architecture for transporting Engine Session Logs from the running Engine instance to Log Storage.
By using the Message Queue, each session is processed individually as it ends, which has the following benefits:
- Log available in seconds
- (Near) real-time reporting
- Central log store improves data security.
Log Storage
The Message Queue function automatically drops all logs in a pre-defined Log Archive in Cassandra. However, these logs are not yet queryable using Teneo Studio since queries are not run on the Log Archives but on the Log Data Sources. The Log Data Sources which are kept in Elasticsearch contain the logs of a configurable period of time, thus representing a snippet of all the logs available in Cassandra.
Both Log Archives and Log Data Sources are configured in Teneo Manager
Keep log content up-to-date
Once created, Log Archives are automatically updated with new logs as soon as a session ends. Users do not need to carry out any further actions.
For Log Data Sources, updating the log contents depends on how the Until field is configured for the Log Data Source in Teneo Manager.
If Now is selected, the Log Data Source is set with a dynamic time range, meaning that the logs contained in the Log Data Source are configured between Now (the current week) and a specific number of weeks to the past (in the above image, 12 weeks). In other words, the Log Data Source time window shifts. An automatic maintenance task updates the Log Data Source daily (default parameter, the frequency of updates is configurable), so no further action from the user is required.
If Now is not selected, then the Log Data Source has a static time-range and is therefore configured between a particular week and a number of weeks in the past. Once imported to Elasticsearch, the content of this type of Log Data Sources is never updated unless actively synchronized.
Note that, in any case, a Log Data Source can be updated at any time using the Synchronization tab in the backstage of the Log Data Source window in Teneo Studio.
Limitations in Import and Synch
To prevent inconsistencies between the content in Log Archives and Log Data Sources, the system does not allow to perform a synch of a Log Data Source while a Log Archive is importing logs and vice versa; these tasks need to be executed sequentially.
If there is an attempt to execute them at the same time, the system will provide a set of warnings and error messages to inform of what is happening.
The data manager's role
A user in Teneo Studio performs queries on the Log Data Source made available to them, and can perform tasks such as creating Augmenters or manage Saved Results. Other tasks though need to be carried out by the data manager as described below.
Before solution publish
Before a solution is published, the data manager needs to set up at least one Log Archive and one Log Data Source where the logs generated by the solution will be stored. If the Log Archive is not created, logs will simply not be stored in Cassandra and will be lost.
Other than the configuration in Teneo Manager, before publication the data manager should also check that the following are ready:
- a created Publish Environment ready to publish the solution
- the pipeline configuration file, this file describes how the conversational AI application and the Log Archives are connected.
At least one Log Archive and one Log Data Source must be set up before the solution is published and it is recommended to set up one independent Log Archive for each published solution.
After solution is live
Once the solution is published, logs are stored in the configured Log Archive automatically. The Log Data Source will be updated depending on how it has been configured.
It is also necessary to set up user permissions so that Teneo Studio users are able to access and manage the Log Data Source for querying; user permissions are managed in Teneo Manager. Note that this task may also be performed before the solution is published, although it is not compulsory.
Finally, at some point during the project lifetime, the data manager might need to import logs from, for example, previous versions of Teneo Studio in Teneo Manager.