next up previous contents
Next: Summary: interfacing with lexicogrammars Up: Lexicogrammatical realization and its Previous: Templates   Contents

Grammar-based realization

As the breadth of the lexicogrammatical component--i.e., the range of grammatical constructions covered--increases, the search space of possible grammatical realizations becomes very large. Issues concerning the appropriate navigation of this search space therefore become crucial. Here, differing kinds of grammatical account can bring different possibilities for traversing the search space to bear.

A structural grammar (that is, one that is primarily organized around phrase structure descriptions) can search for applicable rules constrained by the semantics to be realized. The most well-established strategy here is the semantic head-driven generation algorithm [Shieber, van Noord, Pereira and Moore: 1990]; this algorithm generates strings from logical forms for a relatively wide class of grammar formalisms. The technique essentially works by following chains of grammatical rules related by virtue of their syntactic heads sharing a common semantics (hence semantic head-driven) in order to reach applicable lexical entries. When no such rules are found, or a chain comes to an end, some rule that decomposes the semantics is chosen nondeterministically. When lexical entries are found, the algorithm works back `up' the structure tree imposing the constraints found in the lexicon. Lexical entries are assumed to provide the richest source of constraint concerning the syntactic structure and so are sought first in order to avoid structure building that would later show itself to be inapplicable. Despite the elegance and formal specification of the algorithm, it is virtually unused outside of theoretical formal sentence generation work. There remain many open questions concerning its performance with very large lexicogrammars with substantial non-propositional semantic input requirements. The nondeterminism of the algorithm has also been criticized within NLG as an unsuitable property for large-scale, realistic generation capabilities--although there are some more recent proposals for streamlining the algorithm for generation (cf. [Haruno, Den and Matsumoto: 1996]).

One direct alternative, called `message-directed' processing (cf. [McDonald: 1983]), is favored by the MUMBLE lexicogrammatical component for English. Here, deterministic and incremental phrase construction is controlled directly by the input specifications; these inputs call for particular syntactic tree fragments (expressed in terms of Tree Adjoining Grammars: [Joshi: 1987]) to be selected for combination. An example of input for the MUMBLE generator taken from Meeter [Meteer, McDonald, Anderson, Forster, Gay, Huettner and Sibun: 1987] is shown in Figure 11. This input identifies explicitly the particular grammatical constructions that are to be selected in the resulting sentence--for example, under the corresponding :head slots of the specification we see labels for quite specific syntactic tree fragments of various kinds (e.g., S-V-O_two-explicit-args, np-common-noun).



(general-clause
       :head (CHASES/S-V-O_two-explicit-args
       (general-np
       :head (np-proper-name "Fluffy")
       :accessories (:number singular
       :gender masculine
       :person third
       :determiner-policy no-determiner))
       (general-np
       :head (np-common-noun "mouse")
       :accessories (:number plural
       :gender neuter
       :person third
       :determiner-policy initially-indefinite)
       :further-specifications
       ((:attachment-function restrictive-modifier
       :specification (predication-to-be *self*
       (adjective "little"))) )) )
       :accessories (:tense-modal present :progressive
       :unmarked) )
Input to mumble-86 for the clause: Fluffy is chasing little mice

 

A similar, although more recent style of input specification is that required for the RealPro surface generator [Lavoie and Rambow: 1997]. This again takes a representation that is essentially a syntactic dependency structure and fills this out to become a fully fledged sentence. It is broadly influenced by the highly stratified Meaning-Text-Model of language developed by Mel'cuk and colleagues, but does not draw on the deeper, more abstract linguistic strata proposed by Mel'cuk; however, as a consequence, it is very fast.

A contrasting alternative--called grammar-directed control by McDonald--is offered by the Penman [Mann: 1983b] and its descendent KPML [Bateman: 1997] generators for systemic-functional grammars. Systemic grammars organize their search space around possible communicative functions rather than around grammatical structures: structure fragments are located within this feature space and have themselves a very restricted status. This is particularly effective for the needs of NLG. Given McDonald's characterization of the generation problem as one of complex decision making, it is clear that NLG requires first and foremost accounts of why particular structures--syntactic, textual, etc.--are to be used rather than formal accounts of the structures involved. This is the natural area of concern of functional linguistics, which has accordingly exerted a far greater influence on particularly the larger-scale NLG systems than has been the case in natural language analysis where structural approaches to syntax are still the norm. Systemic-functional grammars (e.g., [Halliday: 1978,Halliday: 1985]) focus precisely on the recoding relationship of function in form and hence provide for a straightforward interface between higher-levels of text organization or planning processes and the grammatical component. Such grammars have also traditionally paid more attention to the non-propositional--i.e., textual and interpersonal--aspects of meaning [Matthiessen and Bateman: 1991,Bateman: 1992a].

The generation algorithm of Penman and KPML consists of successive specificity-increasing traversals of the feature space. Each such traversal creates a set of constraints that determines a single structural fragment; this fragment may include grammatical constituents that demand further traversals for their specification. Although very simple, the algorithm has the advantage that it is quite fast even for large grammars such as the Nigel grammar of English [Mann and Matthiessen: 1985], the KOMET grammar of German [Teich: 1992,Teich: 1999], or teh AGILE grammars for Russian, Czech and Bulgarian [Bateman, Teich, Kruijff-Korbayová, Kruijff, Sharoff and Skoumalová: 2000]; there is no backtracking. Examples of inputs for these systems were given in Figures 1 and 2 above. These inputs also illustrate the simplifications that use of an Upper Model brings for the input to an NLG system: whereas explicit low-level syntactic information needs to be added to the `message-directed' control of MUMBLE, such information is made redundent by the linking of domain concepts to appropriate Upper Model concepts.

The Penman-style generation algorithm has the disadvantage, however, that any choice made has to be the right one (or else generation starts anew with a modified input or more constraints). This is most often criticized from developers working on lexical issues where it can be the case that unsystematic lexical gaps lead a purely semantic driven procedure (such as the fixed-direction traversal of the feature space in Penman) into positions where necessary lexical material is missing. Partly in response to such problems, the Functional Unification Formalism (FUF: [Elhadad: 1990]) provides a more powerful traversal of systemic-like feature spaces by employing non-deterministic expansion by unification. The inefficiencies of unrestricted non-determinism are reduced by several additional mechanisms for guiding the unification process (cf. [Elhadad and Robin: 1992]). A very large coverage grammar of English called SURGE (`Systemic Unification Realization Grammar for English': [Elhadad and Robin: 1996]) has been written for use with FUF; an example of its input is shown in Figure 12. The format of this notation is a recursive structure of pairs consisting of attribute names and their values; the first line states, for example, that the described linguistic unit has an attribute cat (category) whose value is clause. Structure-sharing (i.e., identical values for attributes) is used extensively and is indicated by paths through sequences of attributes and their values introduced by ^. The Surge style of input resembles, on the one hand, that for MUMBLE in that it is mostly syntactic--including syntactic categories (such as clause, np, etc.), lexical specifications (lex "draft"), definiteness and tense decisions, etc.--and, on the other hand, that for Penman/KPML in that there is no direct statement of what kinds of fine linguistic structure are required and semantic information is present (e.g., the structure-sharing between the `carrier' and `agent'/`possessor', indicating that `those in need' are the same as `those who draft').

The generation process within FUF consists of taking such an input and `unifying' it with a similar definition of the entire grammar. The input guides the unification process to seek out those portions of the grammar with which it compatible, and to further specify the structure according to the constraints that the grammar establishes. The unification approach neutralizes to some extent the division between grammar-directed and message-directed control since the path followed during unification is sensitive to both sources of constraint.


(def-test c34
 "Although in need of a center, they drafted a guard        in addition to Flint."
       ((cat clause)
       (tense past)
       (process ((type composite) (relation-type possessive) (lex "draft")))
       (partic ((agent ((cat personal-pronoun) (number plural)))
       (possessor {^ agent})
       (possessed ((cat common) (definite no) (lex "guard")))))
       (circum ((addition ((cat pp) 
       (prep ((lex "in addition to")))
       (np ((cat basic-proper) (lex "Flint")))))
       (concession ((cat clause)
       (mood verbless)
       (process ((type ascriptive)))
       (controlled {^ partic carrier})
       (partic ((carrier ((index {^5 partic agent index})))
       (attribute ((cat pp)
       (prep ((lex "in need of")))
       (np ((cat common) 
       (definite no)
       (lex "center")))))))))))))
Test input pattern for the Surge grammar of English



next up previous contents
Next: Summary: interfacing with lexicogrammars Up: Lexicogrammatical realization and its Previous: Templates   Contents
bateman 2002-09-21