VOnDA: A Framework for Ontology-Based Dialogue Management
Introduction
Natural language dialogue systems are becoming more and more popular, be it as virtual assistants such as Siri or Cortana, as Chatbots on websites providing customer support, or as interface in human-robot interactions in areas ranging from human-robot teams in industrial environments over social human-robot-interaction to disaster response .
A central component of most systems is the dialogue manager, which controls the (possibly multi-modal) reactions based on external triggers and the current internal state. When building dialogue components for robotic applications or in-car assistants, the system needs to take into account inputs in various forms, first and foremost the user utterances, but also other sensor input that may influence the dialogue, such as information from computer vision, gaze detection, or even body and environment sensors for cognitive load estimation.
In the following, we will describe , an open-source framework initially developed to implement dialogue strategies for conversational robotic and virtually embodied agents. The implementation mainly took place in the context of the and projects, where a social robotic assistant supports diabetic children managing their disease. This application domain dictates some requirements that led to the decision to go for a rule-based system with statistical selection and RDF/OWL underpinning.
Firstly, it requires a lot of control over the decision process, since mistakes by the system are only tolerable in very specific situations, or not at all. Secondly, it is vital to be able to maintain a relationship with the user over a longer time period. This requires a long-term memory which can be efficiently accessed by the dialogue system to exhibit familiarity with the user in various forms, e.g., respecting personal preferences, but also making use of knowledge about conversations or events that were part of interactions in past sessions. For the same reason, the system needs high adaptability to the current user, which means adding a significant number of variables to the state space. This often poses a scalability problem for POMDP-based approaches, both in terms of run-time performance, and of probability estimation, where marginal cases can be dominated by the prominent situation. A third requirement for robotic systems is the ability to process streaming sensor data, or at least use aggregated high-level information from this data in the conversational system.
Furthermore, data collection for user groups in the health care domain is for ethical reasons even more challenging than usual, and OWL reasoning offers a very flexible way to access control.
therefore specifically targets the following design goals to support the system requirements described before:
-
Flexible and uniform specification of dialogue semantics, knowledge and data structures
-
Scalable, efficient, and easily accessible storage of interaction history and other data, like real-time sensor data, resulting in a large information state
-
Readable and compact rule specifications, facilitating access to the underlying RDF database, with the full power of a programming language
-
Transparent access to standard programming language constructs (Java classes) for simple integration with the host system
is not so much a complete dialogue management system as rather a fundamental implementation layer for creating complex reactive systems, being able to emulate almost all traditional rule- or automata-based frameworks. It provides a strong and tight connection to a reasoning engine and storage, which makes it possible to explore various research directions in the future.
In the next section, we review related work that was done on dialogue frameworks. In section 10, we will give a high-level overview of the framework, followed by a specification language synopsis. Section 7 covers some aspects of the system implementation. Section 8 describes the application of the framework in the project’s integrated system. The paper concludes with a discussion of the work done, and further directions for research and development.
Debugger / GUI
comes with a GUI that helps navigating, compiling and editing the source files belonging to a project. It uses the project file to collect all the necessary information.
Upon opening a project, the GUI displays the project directory (in a file view). The user can edit rule files from within the GUI or with an external editor like Emacs, Vim, etc. and can start the compilation process. After successful compilation, the project view shows what files are currently used, and marks the top-level and the wrapper class files. A second tree view (rule view) shows the rule structure in addition to the module structure. Modules in which errors or warnings were reported during compilation are highlighted, and the user can quickly navigate to them using context menus.
Additionally, the GUI can be used to track what is happening in a running system. The connection is established using a socket to allow remote debugging. In the rule view, multi-state check boxes are used to define which rules should be observed under which conditions. A rule can be set to be logged under any circumstances, not at all or if its condition evaluated to true or to false. Since the rules are represented in a tree-like structure, the logging condition can also be set for an entire subgroup of rules, or for a whole module. The current rule logging configuration can be saved for later use.
The logging view displays incoming logging information as a sortable table. A table entry contains a time stamp, the rule’s label and its condition. The rule’s label is coloured according to the final result of the whole boolean expression. Each base term of the condition is coloured accordingly, or greyed out if short-cut logic led to premature failure or success of the expression. Inspecting the live system helps pin-point problems when the behaviour is not as expected. The log shows how the currently active part of the information state is processed, and the window offers easy navigation using the mouse from the rule condition to the corresponding source code.
Compiler / Run-Time Library
The compiler turns the source code into Java source code using the information in the ontology. Every source file becomes a Java class. Although the generated code is not primarily for the human reader, a lot of care has been taken in making it still understandable and debuggable. The compile process is separated into three stages: parsing and abstract syntax tree building, type checking and inference, and code generation.
The compiler’s internal knowledge about the program structure and the RDF hierarchy takes care of transforming the RDF field accesses into reads from and writes to the database. Beyond that, the type system, resolving the exact Java, RDF or RDF collection type of (arbitrary long) field accesses, automatically performs the necessary casts for the ontology accesses.
The run-time library contains the basic functionality for handling the
rule processing, including the proposals and timeouts, and for the
on-line inspection of the rule evaluation. There is, however, no
blueprint for the main event loop, since that depends heavily on the
host application. It also contains methods for the creation and
modification of shallow semantic structures, and especially for
searching the interaction history for specific utterances. Most of this
functionality is available through the abstract Agent class, which has
to be extended to a concrete class for each application.
There is functionality to directly communicate with the HFC database using queries, in case the object view is not sufficient or too awkward. The natural language understanding and generation components can be exchanged by implementing existing interfaces, and the statistical component is connected by a message exchange protocol. A basic natural language generation engine based on a graph rewriting module is already integrated, and is used in our current system as a template based generator. The example application also contains a VoiceXML based interpretation module.