Gem Corpus: Salzburg Proposal

The motivation, design and implementation of a multilayered corpus for multimodal documents: the GeM model

John Bateman¹, Judy Delin^2,3, Renate Henschel²

1 University of Bremen

2 University of Stirling

3 Enterprise IDU

Contact author:
John Bateman
(bateman@uni-bremen.de)

Proposed Abstract

In this paper, we present an approach to multimodal corpus design developed within the ongoing GeM (Genre and Multimodality) project. The GeM project is investigating the systematic connections that can be drawn between a rich characterisation of the context of use of multimodal documents and their linguistic, graphical, and layout realizations. We argue that investigations of this kind need to be placed on a much firmer empirical basis than is currently usual. To this purpose, we have designed a scheme for corpus annotation that permits the collection of multimodal documents accompanied by close classifications of their forms, functions and contexts. Within the GeM project itself, four broad document genres have been selected for initial treatment: traditional paper-based newspapers, online web-based newspaper sites, instructional documents, and wildlife books; in each area we have secured a collection of documents and have established contact with designers either expert in these respective fields or, in several cases, actually responsible for the documents gathered.

The theoretical foundation of our corpus design is a multilayered description constructed in terms of: content structured according to standard computational knowledge representation techniques, rhetorical description drawing on an extended form of Mann and Thompson's Rhetorical Structure Theory, a hierarchical layout structure, and a navigational structure for guiding document consumption by the `reader'. Within the corpus, we provide a formal annotation scheme for each of these levels represented in the extensible markup language (XML). The corpus thus consists of a single base-level that identifies all document elements (sentences and sentence fragments, graphics, diagrams, connecting arrows, frames, etc.), combined with several levels of stand-off annotation that impose differing and non-isomorphic structures over the units of the base level. Each level is represented fully in XML and appropriate DTDs are specified. Contact is made with currently emerging standards in document content and layout specification including the extended style language transformation (XSLT) and formatting layers (XSL:FO). Relations between the layers of description are given in the standard XML linking language; we are evaluating a variety of query languages for picking out and testing regularities across the corpus since the implementation of processing engines for XML is currently an area of extremely rapid growth.

We claim, following and extending the initial ground-breaking work in document design of Waller, not only that it is possible to find systematic correspondences between these layers, but also that those correspondences themselves will depend on specifiable aspects of their context of use: in particular, on `canvas constraints' set by the nature of the realizational medium (paper, screen-based browser, palmtop, screen resolution) and `production constraints' imposed by available technology and design choices (allowable cost, number of pages, available printing or rendering techniques, etc.). Our provision of a corpus of multimodal documents serves as the empirical basis for more thorough investigations of this claim. So far our work has identified widespread mismatches between rhetorical purposes and layout structures even among professionally produced documents; this offers a useful basis for constructive critique. Once the corpus is in place, we can also consider more readily the empirical investigation of other claims concerning multimodal document design---for example, Kress and van Leeuwen's proposals of given-new and ideal-real interpretations of page layout.

The final goal of the GeM project is to incorporate the empirically revealed generic constraints between document type and document form in a prototype computational system for both fully and semi-automatic multimodal document generation. For this we are investigating conditionalized translation processes written largely in XSLT that can produce XSL:FO formatting objects documents as output. Such output can then be fed to industry-standard XSL:FO renderers for producing Postscript, Acrobat, etc. versions of the final documents for evaluation by professional designers.