Text design, planning, and construction

Next: Lexicogrammatical realization and its Up: Natural Language Generation Techniques Previous: Interactions between tasks/modules Contents

Text design, planning, and construction

The text design task is to select and organize content into an appropriate text structure. The approaches to this task that have been pursued range broadly in flexibility and computational complexity.

The simplest approach to text macro-organization is provided by canned structures or templates-- almost any kind of organization can be more or less `frozen' and text structure is no exception. Although it is commonly recognized that template-generation sacrifices a great deal of flexibility, it is also true that not all applications require that flexibility. The division between `template' and `full' generation should in any case not be exaggerated. Fixing various aspects of the full generation process corresponds well with a basic property of language itself: in text production, speakers and writers may re-use collections of choices that have been made previously rather than always making those choices afresh; this can range from formulaic sentences and whole texts, through to various degrees of idiomaticity (in syntax, in how arguments are structured, in the semantic configurations used, in text structures, etc.). Structural templates can therefore be seen as the partially `frozen' results of text planning. A `complete' theoretical account of NLG needs to be able to move freely between more or less fixed patterns, combining them transparently in the texts generated in a way appropriate to the particular context of application: accounts that are able to restrict, preferably automatically, their general accounts and to compile out application-specific and useful `templates' will probably result in significant savings. The template approach to text structure may be used for texts which exhibit more stereotypical structures. A relatively flexible kind of text template represented in terms of transition networks is described by McKeown [McKeown: 1985]; this approach has become one of the most widespread text organization techniques used in NLG despite its clear limitations.

More flexible text construction is made possible by employing a theory of text organization such as Rhetorical Structure Theory (RST) mentioned above [Mann and Thompson: 1988]. RST provides a general description of the relations holding among segments of a text, whether or not they are grammatically or lexically signalled. Descriptions of texts, or texts generated using RST, are decomposed hierarchically into a nested set of related text `spans'. RST defines approximately 25 relations which may hold between these spans, motivated originally on the basis of detailed descriptive linguistic analyses of some 400 texts of varying content and genres. An RST-analysis of the Bauhaus biography text illustrating some of the relations defined is shown in Figure 8. This analysis claims that the main point of the text (obtained by following the straight vertical lines) is the information that Albers taught at Black Mountain College, thereby spreading the Bauhaus movement; other information is related to this as indicated by the identified rhetorical relations--i.e., background, circumstance, elaboration and sequence.⁶ The rhetorical analysis brings out the fact that the text is intended to stand as an illustration of the spread of the Bauhaus movement to the U.S. and not as a neutral biography, which would typically consist of a simple sequence of events arranged chronologically instead.

Example RST analysis of a Bauhaus biography text

RST definitions bring constraints to bear on the kinds of meanings that the related text spans must carry, and on the communicative effect achieved by the combined set of text spans. A typical example of an RST definition is shown in Figure 9. Constructing discourse structure in terms of RST relations has proved itself to be useful for supporting selections of linking forms and textual connectives, such as the ``moreover'' of sentence (d), or the deliberate non-selection of a conjunction (by means of the category nonconjunctive in Figure 2): it is only because the discourse is being developed in a particular way that a particular form is appropriate. Once an NLG system has an RST-style text plan available, it has much of the additional information shown above to be necessary for motivating non-ideational semantic choices. RST has also been used to constrain the recency relationship for anaphors (cf. [Fox: 1987]), choices of theme [Matthiessen and Bateman: 1991, p228], and selections of focus [Hovy and McCoy: 1989].

relation name: condition

constraints on N: none

constraints on S: S presents a hypothetical, future, or
otherwise unrealized situation (relative to the situational context of
S)

constraints on the N+S combination: Realization of the situation presented in N depends on realization
of that presented in S.

the effect: R recognizes how the realization of the situation
presented in N depends on the realization of the situation presented
in S.

locus of the effect: N and S

RST definition of relation condition relating a
communicatively dominant text span (N: nucleus) and dominated text
spans (S: satellite); adapted from Mann and Thompson (1987, p65)

RST received its initial computational operationalization by [Hovy: 1988b] and [Moore and Paris: 1988] and has since been incorporated in a wide range of natural language generation systems (see, e.g., [Scott and de Souza: 1990,Bateman, Maier, Teich and Wanner: 1991,Dobeš and Novak: 1991,Cross: 1992,Fawcett and Davies: 1992,Hovy, Lavid, Maier, Mittal and Paris: 1992,Rösner and Stede: 1992,Vander Linden: 1993,André and Rist: 1993]). In order to use RST for generation, rhetorical relations are usually modelled as one type of communicative goal. Standard Artificial Intelligence planning strategies such as top-down hierarchical goal expansion (cf. [Sacerdoti: 1977]) can then be used to produce text stuctures. A detailed overview of the development of rhetorical relation-based NLG methods is given by Hovy [Hovy: 1993].

effect: (PERSUADED ?hearer (DO ?hearer ?act)) constraints: (AND (STEP ?act ?goal) (GOAL ?hearer ?goal) (MOST-SPECIFIC ?goal) (CURRENT-FOCUS ?act) (SATELLITE)) nucleus: (FORALL ?goal (MOTIVATION ?act ?goal)) satellites: nil

Example representation of a communicative goal planning
operator used for `persuading' the hearer to do some act by means of
providing motivations

Communicative goals are typically represented in terms of `plan operators'. An example of such a plan operator from the system of Moore and Paris [Moore and Paris: 1993] is shown in Figure 10. Starting from an initial goal, the simplest variant of top-down planning searches out plan operators that have the desired goal as effect. The applicability conditions of the goals found are checked and, if met, the original goal is decomposed further into the subgoals (as given under the `nucleus' and `satellite' slots of the plan operator). The process then recurses on the selected subgoals. The successful application of the operator illustrated in Figure 10, for example, has the effect that the discourse intention of persuading the hearer to do some act is achieved; when this is the case, a rhetorical structuring involving `motivation' is constructed. Selection of informational content for the text being generated can also be achieved `on the way' as a side-effect of binding variables (e.g., ?goal, ?act, etc.) in the operator's constraints. The planning process `bottoms out' when the subgoals reached call for the direct expression of surface speech acts rather than for further decomposition. The system of Moore and Paris contains approximately 150 such plan operators and is considered sufficiently stable for use in various demonstration systems. Clearly, with larger plan `libraries' issues concerning the organization and interrelationships among plan operators become more significant. One approach that has been attempted for organizing such libraries is in terms of classification networks; for example, systemic networks with choosers and inquiries (see below) have been used to explore representations of functional motivations for rhetorical relations in terms of communicative goals (e.g., [Hovy, Lavid, Maier, Mittal and Paris: 1992,Vander Linden, Cumming and Martin: 1992]). More detailed basic introductions to RST and its use in NLG are given in Bateman and Zock [Bateman and Zock: 2002] and vander Linden [Vander Linden: 2000].

The smallest text spans and leaves of the rhetorical structure in an RST-style analysis are often taken to be equivalent to semantic specifications of clauses that can be passed to a lexicogrammar for realization. This simple approach is problematic in a number of ways (cf. [Meteer: 1992,Bateman and Rondhuis: 1997]). In particular, it is often unclear where to stop rhetorical planning and let lexicogrammatical realization take over. That is, when a body of material is being organized rhetorically, a degree of segmentation will be achieved for which a lexicogrammar offers a possible realization even though it is still possible to continue rhetorical decomposition. This is illustrated in several of the contrasts among sentences (k)-(p) above where different portions of the RST-tree shown in Figure 8 have been selected for expression as single sentences. The precise motivations for deciding whether to use rhetorical or lexicogrammatical realizations are still unclear.

This problem has been discussed in both monolingual and multilingual generation. In monolingual generation, the problem has been construed as a `gap' between rhetorical specifications and specifications appropriate for lexicogrammatical generation (cf. [Meteer: 1992]). In multilingual generation, the problem has been raised because of the tendencies of differing languages to draw the line between inter-sentential realizations and intra-sentential relations differently (cf. [Rösner and Stede: 1994]). One of the few linguistic treatments of such a phenomena is given in the work on realizations of conjunctive relations involving `grammatical metaphor' by Martin [Martin: 1992]. The phenomenon can also be considered a further instantiation of aggregation since, in the general case, aggregation applies equally within the rhetorical structure: given such a structure (e.g., as a result of text planning), it is possible to search for regularities or repetitions of structure in order to find opportunities for factoring out commonalities or for enforcing simplifications. Several transformations for rhetorical structures under particular conditions are proposed by, for example, Scott and de Souza [Scott and de Souza: 1990].

The question of granularity and groupings of forms of expression is applicable at any level of organization that a system adopts. Aggregation as such is thus in marked opposition to the `normal' text planning technique of goal-directed planning. Structures to be aggregated may be found opportunistically as they arise with concrete data. However, again in opposition to common assumptions made in aggregation algorithms, the type and degree of aggregations is not independent of the functional communicative goals of a text. This is also a natural consequence of adopting a two-strata description in terms of semantics and register as done above; it is emphasized that little that occurs at the semantic stratum is not explicitly motivated. It also permits cross-classification of effects that otherwise might appear confusing: for example, there are interpersonal (register: e.g., level of expertise) selections that effect ideational (semantic: e.g., selection of technical predicates) selections. Integrating data-driven opportunism and functional goals in a satisfactory manner as this linguistic perspective requires will be an important step towards substantially improving generation functionality in the future.

There are also a number of problems that arise directly with the simpler hierarchical planning mechanisms adopted for RST-style text planning to date and more complex strategies are being proposed. Young and Moore [Young and Moore: 1994], for example, point out that existing text planners can guarantee neither that a text plan will be found when one exists to fulfill some communicative goal, nor that all alternative existing plans will be found. They accordingly propose an experimental alternative based on partial-order causal-link planning. Hovy [Hovy: 1988c] shows that there are distinct types of communicative goals with differing satisfaction behaviors, describing particularly how interpersonal predispositions can be implemented in terms of goals that are only achieved by repeated opportunistic deployment of some linguistic resources rather than by the traditional top-down, once-only deployment characteristic of non-interpersonal meanings. Lascarides and Oberlander [Lascarides and Oberlander: 1992] discuss the use of abduction as an alternative text content planning method; and Jokinen [Jokinen: 1996] adopts an abduction-based approach to planning cooperative responses in dialogue.

One particularly problematic area in planning, however, remains in the treatment of a fundamental property of discourse: that is, that texts `unfold' dynamically in time. It is not realistic to rely on models that show the production of a text only in terms of a completed structure. Unfortunately, this is the most straightforward interpretation of RST analyses (p [])251]Martin92cf., where communicative goal expansion is not most naturally constrained to produce individual progressive contributions to a discourse. Although there have been attempts to unify the RST-view with that of dynamic dialog development (cf. [Fawcett and Davies: 1992,Fischer, Maier and Stein: 1994,Daradoumis: 1996]), the combination remains somewhat forced. This is one important motivation for Meteer's Text Structure [Meteer: 1992] alternative: here the text structure is intrinsically incremental. The generation process then consists of taking chunks of content to be expressed and fitting these into the developing text structure as permitted by the current constraints holding on the discourse. A closer synthesis of dynamic and non-dynamic representations is therefore desirable in order that the goal-oriented aspects of text production are not sacrificed.

Towards this aim, there has more recently been an attempt to develop an approach to incremental discourse development drawing on Tree Adjoining Grammars, or TAGs [Joshi: 1987,Vijay-Shanker and Joshi: 1988]. TAGs are one of the three main approaches to surface generation adopted within NLG and are particularly suited to incremental generation. The main theoretical appeal of TAGs is that they allow a certain amount of linguistic information to be packaged locally within particular tree configurations; these tree configurations can subsequently be combined very flexibly. This property has now been suggested to apply usefully to discourse structure. Research in this direction, beginning in work on formal discourse structure and discourse `parsing' [Gardent: 1997], argues that discourse structures can be constructed dynamically using similar mechanisms to those used syntactically when generating sentences with TAGs [Gardent and Webber: 1998,Webber and Joshi: 1998,Webber, Knott, Stone and Joshi: 1999]. How far this approach can be taken remains a matter for future research.

Next: Lexicogrammatical realization and its Up: Natural Language Generation Techniques Previous: Interactions between tasks/modules Contents

bateman 2002-09-21