Situated Dialogue Systems: Agency & Spatial Meaning in Task-Oriented Dialogue

Robert Ross

This thesis argues that task-oriented conversation between humans and spatially situated dialogue systems requires a systematic understanding of spatial language, models of spatial representation and reasoning, and theories of intentional action and agency -- and that all of these models be made accessible within a dialogue processing framework that, while modularizing these issues, pulls them together within a tightly coupled architecture. While such issues pose research questions which are significant, particularly when considered in the light of the many other challenges in language processing and spatial theory, the benefits of competence in situated spatial language to the fields of robotics, geographic information systems, game design, and applied artificial intelligence cannot be underestimated. To progress us towards such overall goals, this thesis develops a modularized agent-oriented language processing framework for spatially situated agents.

Review of existing theories

Existing theories of dialogue modelling and management are reviewed in detail in order to establish a background in the state of the art in language processing, and to determine where boundaries should or should not be drawn between theories of language competence and those of spatial reasoning and agency. This review concludes that, while existing dialogue models are well developed and highly sophisticated, the particular field of situated spatial language processing requires: (a) greater clarity in the relationship between language representation and domain reasoning; (b) a more systematic approach to situational contextualization to account for the dynamic nature of interpretation in spatial dialogue; and (c) that we look back to issues of agency in dialogue systems to enable a tighter coupling between dialogue processes and the agent's domain-specific capabilities.

Overview of the model.

A set of desiderata for situated dialogue systems is thereafter developed, and used to motivate a tiered architecture for dialogue processing that pulls apart layers of language and knowledge representation so as to facilitate reusable and hopefully more scalable communication about space and action. The three tiers of this architecture are subsequently presented in detail. The first tier of the architecture, i.e., the Language Interface, provides the processes and resources which link surface language to the agent's own conceptual models through a spatially rich linguistic semantics that is optimised for the syntax/semantics interface. The second architecture tier, the Agent-Oriented Dialogue Management model, provides a dialogue processing theory which marries a semantics-centric view on dialogue modelling with a practical theory of intentionality, as well as a transparent approach to situational contextualization through functional content resolution and augmentation. The third architectural tier is a concrete situational model against which the Language Interface and Agent Oriented Dialogue Management Model are coupled. This third tier is investigated by developing a model of verbal route interpretation for navigating robots in partially known environments.

Summing up bit.

The models detailed in this thesis have been implemented as part of a reusable framework for semantics-rich dialogue processing. Before drawing the thesis to a close, I report on this implemented framework and its use in conducting human-human and human-computer studies of verbal route interpretation. The thesis then concludes with a summary of the contributions made, a comparison with other theories in the dialogue systems community, and a discussion of possible future directions for this work.