Infosheet: Dialog systems

This page contains introductory material on dialog systems prepared for SFB/TR projects I3, I1, ...
(John Bateman: April-May 2003)

Dialog systems face the task of monitoring spoken or written input, relating the input to goals and tasks defined by the system, following up misunderstandings or knowledge gaps that need to be filled to carry out further tasks, and responding to the user where appropriate. This is most easily modelled computationally as a collection of information states, with particular user and system utterances defining legal shifts between states. The more restricted the set of information states is, the easier this is to model in a very simple fashion. The central component for controlling the movement between such information states is the dialog manager.

Two general approaches to dialog management (in the division suggested by McTear, see below) are:

The latter McTear further divides into:

All finite-state methods are based on transition networks.

All techniques entail attempts at dealing with the 'discourse history' problem. That is, both system and user must know where they are in a conversation, what has been said, what has been achieved, what has been established. Without some access to this information, no dialogic interaction is possible. An intermediate step is frame-based dialog control: here a system essentially just tries to fill in the necessary slots of a frame and gathers information from user input as it can until the frame is complete. When there is no user input to be processed, the system moves onto the next slot of the frame to be filled. This is one of the oldest techniques and is best used for simple applications where some particular set of related information is to be obtained from the user.

Joint-action model (Cohen, 1998); McTear talks of "Agent-based" dialog control. Here hierarchical plan structures are set up for the achievement of goals. Some of which may require obtaining information from the user. Some agents use theorem proving to derive actions, others AI-style planning. McTear's (2002) very detailed (97 pages) review of the state of the art (Michael F McTear (2002) Spoken dialogue technology: enabling the conversational interface. ACM Computing Surveys, Volume 34 , Issue 1 (March 2002), pp. 90 - 169) can be found here.

The most recent developments all try to specify general dialog control strategies that may be applied to dialogs in any application scenario. To support this, a variety of tools or workbenches have been built that allow dialog managers to be built much more quickly. Some of these are listed below. One prominent approach has concentrated on providing a Dialog Move Engine (DME) that specifies transitions between Information States. Information state methods are the basis of the TRINDI approach.

With the current state of the art, analysing speech input is so difficult that efforts in spoken dialog systems place a lot of emphasis on this subtask and on dealing with the problems that it creates (e.g., lots of clarificatory dialog when the speech recognition component messes up); this imbalance needs to be redressed.

Dialog Management Tools

A Dialog Move Engine development tool has been developed as the TRINDIKIT.

"TRINDIKIT is a toolkit for building and experimenting with dialogue move engines and information states. ... TRINDIKIT [] specifies formats for defining information states, update rules, dialogue moves, and associated algorithms. It further provides a set of tools for experimenting with different formalizations of implementations of information states, rules, and algorithms. ... To build a dialogue move engine, one needs to provide definitions of update rules, moves and algorithms, as well as the internal structure of the information state. One may also add inference engines, planners, plan recognizers, dialogue grammars, dialogue game automata etc."

EXAMPLES of dialog control using information states and Trindi are on the TRINDI site.

The CSLU toolkit.

"The CSLU Toolkit was created to provide the basic framework and tools for people to build, investigate and use interactive language systems. These systems incorporate leading-edge speech recognition, natural language understanding, speech synthesis and facial animation technologies. The toolkit provides a comprehensive, powerful and flexible environment for building interactive language systems that use these technologies, and for conducting research to improve them. "

Some experience of using this toolkit is reported by Michael McTear [1998] Modelling spoken dialogues with state transition diagrams: experiences with the CSLU toolkit. In Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP-98), December 1998, Sydney, Australia, pages 1223-1226. [37k PDF]

Dialog Annotation

There is considerable interest in acquiring dialog and conversational data in a way that supports analysis. This is to serve as input to more accurate models of conversational and dialogic behavior. Here the notion of markup and annotation is an important consideration, since this allows larger quantities of data to be organized and interrogated with automatic methods.

Annotation is also of particular interest in that it forces explicit libraries of possible discourse moves to be made sufficiently explicit so that they can be discussed: this amounts to proposals for standard 'ontologies' of discourse moves.

One of the largest efforts to develop a workbench and tools for working with annotated conversational data was the MATE (Multilevel Annotation, Tools Engineering) project. This has now been continued in the ongoing NITE project, which is still in its early stages. NITE attempts to use the state of the art in XML, XSLT, Xpointer, etc. to lever off developing annotation standards in W3C efforts more generally. The design philosophy of NITE relies on someone being proficient in XML/XSLT, because annotation-specific interface details are specified as XML style sheets and transformations. This approach is becoming increasingly common, and NITE represents probably the most complex and most general effort in this direction so far.

An extensive list of annotation forms and tools for them is given here (not all of these have been checked to see if they still work). For present purposes, particularly within project I1, we are exploring the extent to which new, more or less 'ready-to-use', tools can be used for annotating our data while keeping up with developments in usability and reliabiliy of the NITE effort. Two tools are particularly in focus at present: MMAX (European Media Lab, Heidelberg) and ANVIL (Saarbrücken). Meanwhile data is being kept in increasingly richly structured XML form for future migration.

A further collection of links and addresses for deployed systems is given on Gregor Erbach's page for the ESSLLI01 workshop course "Languages for the Annotation and Specification of Dialogues" here. A general introductory course in dialogue systems was given at Macquarie University in 2002: overheads here.