This thesis argues that task-oriented conversation between humans and spatially situated dialogue systems requires a systematic understanding of spatial language, models of spatial representation and reasoning, and theories of intentional action and agency -- and that all of these models be made accessible within a dialogue processing framework that, while modularizing these issues, pulls them together within a tightly coupled architecture. While such issues pose research questions which are significant, particularly when considered in the light of the many other challenges in language processing and spatial theory, the benefits of competence in situated spatial language to the fields of robotics, geographic information systems, game design, and applied artificial intelligence cannot be underestimated. To progress us towards such overall goals, this thesis develops a modularized agent-oriented language processing framework for spatially situated agents.
Existing theories of dialogue modelling and management are reviewed in detail in order to establish a background in the state of the art in language processing, and to determine where boundaries should or should not be drawn between theories of language competence and those of spatial reasoning and agency. This review concludes that, while existing dialogue models are well developed and highly sophisticated, the particular field of situated spatial language processing requires: (a) greater clarity in the relationship between language representation and domain reasoning; (b) a more systematic approach to situational contextualization to account for the dynamic nature of interpretation in spatial dialogue; and (c) that we look back to issues of agency in dialogue systems to enable a tighter coupling between dialogue processes and the agent's domain-specific capabilities.
A set of desiderata for situated dialogue systems is thereafter developed, and used to motivate a tiered architecture for dialogue processing that pulls apart layers of language and knowledge representation so as to facilitate reusable and hopefully more scalable communication about space and action. The three tiers of this architecture are subsequently presented in detail. The first tier of the architecture, i.e., the Language Interface, provides the processes and resources which link surface language to the agent's own conceptual models through a spatially rich linguistic semantics that is optimised for the syntax/semantics interface. The second architecture tier, the Agent-Oriented Dialogue Management model, provides a dialogue processing theory which marries a semantics-centric view on dialogue modelling with a practical theory of intentionality, as well as a transparent approach to situational contextualization through functional content resolution and augmentation. The third architectural tier is a concrete situational model against which the Language Interface and Agent Oriented Dialogue Management Model are coupled. This third tier is investigated by developing a model of verbal route interpretation for navigating robots in partially known environments.