Variation: how to describe and control it

Next: Ideational decisions Up: Natural language generation: Setting Previous: Which text? Which methods? Contents

Variation: how to describe and control it

We have now seen some of the motivations for employing the sophisticated approaches to text production pursued within NLG. It is equally important, however, to realize that the view of NLG simply as a front-end technology that may be plugged onto existing applications also needs to be considered more critically. In particular, the form of an appropriate `input' for natural language generation requires careful consideration in its own right. The more closely a proposed input corresponds to any particular output form, the less that input will support the production of variations. This is simply due to the `distance' of the transformation required and is analogous to the equivalant problem found within Machine Translation concerning the design of an `interlingua'. In order to achieve flexible language generation, NLG systems seek to work with a level of input representation that is sufficiently abstract so as to remain uncommitted about many details of the final linguistic product.

To explicate this aspect of NLG, it is useful to employ the linguistic notion of stratification in order to describe linguistic variation more accurately. Stratification refers to the differing kinds of information, at differing levels of abstraction, that contribute to the organization of any text. Thus, we can talk of lower levels of linguistic abstraction, such as phonology, morphology, syntax or lexicogrammar, and semantics and the higher levels of abstraction, such as those of text structure and style. Texts might then vary in their use of different text structures, of different syntactic structures, of different phonologies (when spoken), and so on. Regardless of an NLG component's actual inputs and its context of application, we can identify generation tasks in relation to the level of linguistic abstraction, or stratum, at which variation is being offered or controlled. We begin by focusing on the linguistically less abstract strata of grammar and semantics. It is these strata that contain generalized information that carries across differing contexts of application and use; that is, any text requires grammatical sentences and so we will always find accounts of grammar to be useful regardless of application context--there are broad similarities in lexicogrammatical and semantic organizational forms that are independent of the desired application.²

A minimally necessary (but not sufficient) set of organizations capable of explaining how a text works--i.e., capable of explaining its `textuality'--are the two of the least abstract linguistic strata alluded to above:

Lexicogrammar: certain sequences of elements can be recognized as belonging to English (or some selected language) while others are not. The possible grammatical structures used in a language and the associated words that may (or may not) occur in them define a very constraining organization that must be adhered to in any text. Sentences are not arbitrary combinations of words.
Semantics: in addition to knowing which structures and words are possible, there is also a clear organization that constrains the selection of the possible to just those that are necessary for the text at hand. Texts are not arbitrary combinations of sentences.

There have been many different kinds of theoretical accounts of grammar developed within linguistics, and it is possible to find individual examples of computational generation systems for most of them; some of the possibilities will be discussed below under `techniques'. In contrast, the second area of organization is in need of considerably more work. A particularly useful approach is the consideration of the semantic control that a detailed lexicogrammar requires in order to make its selections. This also places some natural constraints on the types of approach to grammar that adapt themselves most naturally to NLG: it is necessary not only to specify what kinds of structures are possible in a language, but also to access `decision points' between possible structures so that communicatively effective, as well as grammatically correct, syntactic structures result. Although the first representations of such information simply annotated structural rules or lexical units with `semantic' or `pragmatic' information as needed for particular effects (e.g., [McKeown: 1985]), there have now been attempts to provide a broader range of controlling/constraining semantic information necessary for motivating lexicogrammatical generation.

Even though particular NLG systems will differ in the precise input form they adopt, linguistic semantic representations provide a good basis for considering alternatives independently of particular implementations. We show this more concretely by setting out in the following subsections the linguistic semantic representations corresponding to some of the sentences of our example biography texts.

Subsections

Next: Ideational decisions Up: Natural language generation: Setting Previous: Which text? Which methods? Contents

bateman 2002-09-21