next up previous contents
Next: Registerial and discourse semantic Up: Variation: how to describe Previous: Interpersonal decisions   Contents

Textual decisions

A more critical range of lexicogrammatical variation to be controlled during NLG is that of textual meaning. Almost all examples of texts actually generated to date (including our example texts above) have problems related to their expression of textual meaning--i.e., there are places where a human text producer would have made selections resulting in a more fluent, or natural, text. The main problem remains to isolate and describe the sources of constraint for lexicogrammatical decisions that influence textuality: i.e., again, how to control a lexicogrammar. The richer the lexicogrammatical resources that are represented, the more complex the problem becomes. If a lexicogrammar offers few possibilities, then issues of control are reduced; however, possibilities for fluency are then similarly reduced. Many current approaches to NLG therefore concentrate on finding appropriate control constraints for general resources rather than on relying on artificial restrictions on the resources themselves.

Any reasonable lexicogrammatical resource of English will offer, for example, in addition to the sentences that appeared in the example texts, sentences such as:

(e) With Brandenburg, she studied art in 1916--1919.
(f) In 1916--1919, Anni Albers studied art with Brandenburg.
(g) She studied art with Brandenburg in 1916--1919.
(h) The third artist shown above studied art with Brandenburg in 1916--1919.
(i) Moreover, she studied art in 1916--1919 with Brandenburg.
(j) It was in 1916--1919 that she studied art with Brandenburg.

Sentences (e)-(g) might be argued not to differ in propositional content, although they have different preferred contexts of use in texts; sentence (h) introduces semantic content to refer deictically to the artist in question; sentence (i) makes its relation to the preceding text more explicit and so can also be taken as introducing more information; sentence (j) has a similar function, but uses a particular grammatical structuring to realize that function. Since all are compatible with a single skeletal ideational semantic specification, such variation shows again that a logical representation of the propositional content being expressed leaves the generation process seriously underconstrained. While approaches that concentrate on decontextualized sentence generation can find this variation secondary, it is in fact crucial for the generation of acceptable texts. Further information must be provided to a lexicogrammar if it is going to be able to make its generated result appropriate for the text being produced. The kinds of decisions shown in sentences (e)-(j) indicate predominantly textual information: i.e., decisions have been made that embed the propositional content expressed appropriately in differing texts and differing contexts of use.

A more complete SPL specification including the necessary interpersonal and textual distinctions to generate exactly the sentence that occured in the example text rather than any of its variations (a)-(j) is shown in Figure 2. Here it has been necessary to include information specific to a particular generation system since there are still no general categories accepted in NLG as a whole; the specification builds on the semantic categories (inquiries) of the Nigel grammar of English (cf., e.g., [Mann and Matthiessen: 1985]) from the Penman system. Most sufficiently detailed systems would require similar kinds of information; the additional textual information is shown in boldface and has consequences both for the ordering and form of the surface constituents. The :theme specification constrains the constituent that appears in initial position to be the actor (p). The identifiability of the actor supports contrasts such as ``a person'' vs. ``the person'' or ``she'', while :empty-number-q constrains the referring expression further indicating that there is nothing that needs to be expressed apart from number information (i.e, ``she'' vs. ``they''). Contrasts supported here include that between nonempty: ``the women'', where gender has also been expressed, and empty: ``they''. Note that even the selection of superficially `propositional' information such as number can be textually motivated: e.g., ``The lion is almost extinct'' vs. ``Lions are almost extinct''--the decision as to whether lions are to be referred to as individuals or as a species depends on the perspective taken in the text and hence must be constrolled as a consequence of text planning (see below) rather than by appeal to the `logical content'. The provision of a reference time (:time-in-relation-to-speaking-time-id) constrains the choice between ``studied'' and ``was studying'' since this is usually a textual choice depending on how the discourse is being developed. The specification of newsworthiness precedence relations allows finer variations to be constrained (e.g., ``in 1916--1919 with Brandenburg'' vs. ``with Brandenburg in 1916--1919''); this is particularly important for spoken utterance generation--since, in English and several other languages, the choice of intonational prominence is directly driven by newsworthiness (cf. [Prevost and Steedman: 1994,Teich, Hagen, Grote and Bateman: 1997])--as well as for languages such as Czech, Russian, Turkish, etc. where textual status is crucial for determining word order (cf. [Hoffman: 1994,Kruijff-Korbayová, Kruijff and Bateman: in press]).

 
 
(s0 / study
:theme p
:conjunctive-relation-q nonconjunctive
:speech-act (sa / assertion
:speaking-time (st / time))
:actor (p / (person female)
:identifiability-q identifiable
:empty-number-q empty)
:accompaniment (b / (person male) :name Brandenburg)
:actee (a / art)
:temporal-locating (i0 / three-d-time :name 1916-1919)
:time-in-relation-to-speaking-time-id (st) i0
:event-time i0
:precede-q (i0 st) precedes
:newsworthiness (:accompaniment > :temporal-locating))

More complete SPL-style input specification for the
sentence ``She studied art in 1916 - 1919 with Brandenburg.''


next up previous contents
Next: Registerial and discourse semantic Up: Variation: how to describe Previous: Interpersonal decisions   Contents
bateman 2002-09-21