Introduction to Natural Language Generation:
Creating and developing an example set to act as a target text corpus

John Bateman
Bremen, January 2001

Orientation Starting up Making the first SPL Making an example set Using a changed example set Fixing up lexical items Clearing up

 

Orientation

      This document shows by example the first steps towards creating a target set for working on generation for a given domain. The domain that we are using for illustration is one of simple tourist brochures. The task is to generate as closely as possible a sequence of sentences that together make up a short tourist brochure text. We do this by making an example set. An example set consists of a set of example Sentence Plan Language (SPL) expressions; when we send each of these SPL expressions one after the other to an appropriate grammar, we get a target text.

      It is quite likely that in the first attempts, the text that is produced by the SPL expressions is not going to be exactly that of the desired target text. Any real text will set challenges that need more information and more sophistication before they can be solved fully: just getting close will be more than enough as a first exercise and will still raise most of the issues that need to be considered at this stage.

      We will proceed to an example set using the basic introductory KPML program 'image' containing a large English grammar and the SPL-Authoring Tool (SPLAT). This document assumes that you have already had some experience using SPLAT to make SPL expressions and generate sentences. During the task described here we will gradually move away from SPLAT to work directly with example sets proper. We will also see how to organise your linguistic information to work more effectively.

Starting up

      The starting 'image' should be one such as that at:

      http://www.fb10.uni-bremen.de/... .

      This should be downloaded onto your machine as usual for working with one of these introductory images. Since this image already contains a very large grammar of English, it can be used for many more things than described in this document. If you would like a CD containing this program ask me directly.

      Although it is sufficient to use the program if you just get it onto your computer somewhere and click on the resulting icon, we can make the program more useful and convenient by giving it a bit more information about where it is, and where you are going to keep your linguistic information—such as the target example set or, if you were developing things further, new bits of grammar, etc. You can give this information to the program by providing a special file called a configuration file. An example of a simple configuration file is given at:

      http://www.fb10.uni-bremen.de/... .kpmlconf.novice

      When the generation program starts, it looks in several usual places on your computer to see if it can find a configuration file. If you put a configuration file of your own in one of those places, then the generation program will read your instructions. Usual places that are looked for a configuration file are folders such as "C:\Windows\TEMP" and the folder where the program itself has been put. The name of the configuration file for the program being used here is 'kpmlconf.novice'.

      The most important line in the configuration file for the present is the one beginning *root-of-resources*. This tells the programme where to look for grammars, example sets, and where to save example sets, lexicons, and so on. You should decide where you are going to keep your linguistic information, make a folder accordingly, and then edit the configuration file so that the root of resources points to your folder. So if, for some reason, your linguistic information is going to be in the folder "\fbd1\my-linguistics\" on the Z drive, then the first line of your configuration file should be changed to look as follows:

      *root-of-resources* "Z:\\fbd1\\my-linguistics\\"

      Note that for every backslash, two must be written, that the folder name is given within double quotes, and that the folder name ends with a double backslash too. When changed, save the edited file.

      Finally, when you have made a folder such as 'my-linguistics', you need to make some subfolders too. For the present these should have the following structure and names:

      my-linguistics\NIL\Examples

      my-linguistics\NIL\Lexicons

      This is the minimal folder structure that the KPML program needs to have around in order to work properly. If you carry out some of the instructions below without this structure in place, then when saving lexicons, for example, the program will stop and give a message that it cannot find where it needs to save the lexicon information to.

      With this alteration the generation program proper can be started in the usual way—i.e., by double-clicking on it. This will then bring up three windows: an initial 'black box window', which is used for behind-the-scenes messages about what is going on, and two working windows: one labelled 'KPML – version: X.X –', which we will the main KPML window, and the other labelled 'Development'. All of these windows are essential for the program to work properly and none of them should be killed at any time (e.g., by clicking on the upper right hand cross in the window). When started, the black box window will already show you whether or not the generation program has found your configuration file. If it has not, then you had better move it somewhere else!

      Since this program already has an English grammar and many examples already loaded within it, you do not need to do anything else before attempting to generate. You can immediately start the SPL authoring tool, by selecting it from the list that is brought up when you click on the Tools button towards the bottom of the main KPML window.

      This brings up the, initially empty, SPLAT window.

Making the first SPL

    The first task will be one that should be familiar to you—that is, making an SPL. As an example, I will take the following sentence from one of the tourist brochure texts:

Visit Aldeburgh in the south where you'll find craft galleries and antique shops.

Now, I repeat that it is not a problem if we are not able to generate this sentence exactly as it looks here: lets just see how close we can come. We will be able to get further when we leave SPLAT later on to work with example sets directly.

The first step is to break the sentence up into clauses. If we can generate the individual sentences:

    1. Visit Aldeburgh in the south
    2. (There) you'll find craft galleries and antique shops.

then it is already a considerable step in the right direction.

I am presuming that you already know how to experiment with making SPLs using SPLAT. That is, you know that both clauses are somehow to be made as Processes (of appropriate semantic types) with various Participants (actors, goals, etc.) and Circumstances (locations, times, etc.), which must themselves be defined as little bits of SPL. This means that you should fairly quickly come to a situation similar to that shown in the following:

 

And the SPL under the Process 'visit-0' will generate something similar to our first clause above (very likely not exactly!). If you have a result which is at least a step in the right direction, then it is a good idea to save your work so that it can be retrieved in case of problems later on. To do this, select one of the first 'Save' options from the File menu (top left): either 'Save' or 'Save as'.

In the case above, the SPL under 'visit-0' was:

             (VISIT-0 / VISIT
                      :SPATIAL-LOCATING
                      (SOUTH-0 / THREE-D-LOCATION
                           :LEX SOUTH
                           :DETERMINER THE )

                      :LEX VISIT
                      :SENSER
                      (HEARER / PERSON
                           :NUMBER SINGULAR )

                      :PHENOMENON
                      (ALDEBURGH-0 / TOWN
                           :IDENTIFIABILITY-Q IDENTIFIABLE
                           :FAVOR-Q CLASSIFY
                           :LEX ALDEBURGH )

                      :TENSE PRESENT
                      :POLARITY POSITIVE
                      :SPEECH-ACT IMPERATIVE )

 

and the sentence this generated was: 'You visit Aldeburgh in the south'. It is, however, guessing at some of the lexical items, since as we can see in the right-hand part of the SPLAT window above, there are some new lexical items that are not yet known to this program, and these have to be checked. When you do so, you will probably find that the program already has made reasonable guesses for 'Aldeburgh' and 'south' but has some information missing for 'visit'—you will need to set the lexical class and the ending type in order to arrive at the correct verb paradigm for 'visit': i.e., visits, visited, visiting, etc.

When we have defined our lexical items, we should also save these of course. You can do this by selecting Save New Lexical Items, again from the File menu. We will return to what the program has done with the new lexical items and how you can get at this information later: but the immediate result of saving is that a new file containing your lexical information is placed in the folder that you created above: 'my-linguistics\NIL\Lexicons'.

We can proceed for the second clause above similarly.

Making an Example Set

      Let us now assume that we have the objects created for the two clauses. The SPLAT window might look something like the following. The sentences generated by the Processes 'visit-0' and 'find-0' are similar to the sentences that we want, and we can't see for the moment how to get any closer.

      Our next step is to turn these SPL objects that SPLAT knows about into an Example Set. We do this by selecting under the File menu 'Save as Example Set'. This then brings up a normal dialog box that asks us where we want to put this example set and what it should be called. It is traditional to give example sets file names with the extension ".spl" because they contain a set of SPL expressions. The result of this operation is to put all our carefully created objects into a file as Examples: examples are easier for us to edit, but we have to know more about SPL in order to edit them correctly. It is easier to turn an SPL expression into nonsense if editing is not done correctly. That is why we started with SPLAT, which takes care of the form of the SPLs for us but with the cost that some SPLs are no longer possible to create.

      Take a look at the created example set file to get used to its form. All example set files contain examples. Each example has a name, is allocated to a particular named example set (those made from SPLAT each belong to the example set named 'splat' to begin with), and an SPL expression tucked underneath the label 'logicalform'. The example set file for the above SPLAT-made objects can be found at:

      http://www.fb10.uni-bremen.de/.../tour-eg1.spl

       

Using a changed example set

One of the advantages of moving to the proper example set files is that we are no longer limited by what SPLAT lets us write. As an example, lets consider the problem of generating the second clause above—particularly, the part about 'craft galleries and antique shops'. Now, unless you found some trick, it was probably not possible to generate both the craft galleries and the antique shops, even though it is in fact quite simple. In the SPL one just needs to replace the single element for craft galleries by a list of two elements: one for the craft gallery, the other for the antique shops. The SPL chunk for 'antique shop' can be constructed on exactly the same lines as that for 'craft galleries', so we can go into the example set file and edit this directly. That is, we take the part which says:

       :ACTEE
                      (GALLERY-0 / TOURIST-ATTRACTION
                           :CLASS-ASCRIPTION
                           (CRAFT-0 / OBJECT
                                :LEX CRAFT )

                           :LEX GALLERY
                           :NUMBER PLURAL )

and replace it by:

           :ACTEE
                     ((GALLERY-0 / TOURIST-ATTRACTION
                           :CLASS-ASCRIPTION
                           (CRAFT-0 / OBJECT
                                :LEX CRAFT )

                           :LEX GALLERY
                           :NUMBER PLURAL )
                      (GALLERY-0 / TOURIST-ATTRACTION
                           :CLASS-ASCRIPTION
                           (CRAFT-0 / OBJECT
                                :LEX antique )

                           :LEX shop
                           :NUMBER PLURAL )
                     )

You can see that the editing has to be done quite carefully, and there are in fact special editors that help you to make sure that the brackets all balance, etc.—but for the moment, you will just have to be careful. It is also good practice to change the name of the example when you change its details: this means that you have a complete record of all the intervening steps when you are working on a more complicated sentence.

Now, to test this, you can simply load it back into SPLAT again to see if anything changes. You do this in two steps.

When you do this, the program asks you to select which example set it is that you want to use. Typically, you might be working on several example sets at once—for example, for several different target texts, and these can all be loaded into the program at the same time. With the current program, there are in fact quite a few example sets giving examples of different aspects of the grammar and so you need to find the one that is called 'splat' that we have just loaded in the previous step.


When you accept this choice, you will see that several new objects appear in the SPLAT window (if you have just reloaded this example set on top of the objects that you saved previously, then you will now have several objects of the same name!).

Assuming that you changed the name of the changed example set above as suggested, and that you changed it so something like 'FIND-0-A', however, this should also now occur under the list of Processes and you can try generating it. This should now give something like the following:

Here you can see that there is now a list of things to find rather than just the galleries, reflecting the change that you made in the SPL of the example set.

One thing to note at this stage is that the form 'antique shop' is not yet correct: this is because the program does not know the lexical item 'shop' so it does not know how to form the plural: we will see how to change this in a moment. Another is that the Process 'find-0-a' is shown in grey in the SPLAT window. This is because SPLAT objects that have been loaded from example sets are not then editable from SPLAT. If you try it, you get a message like the following:

    You can see here (if you look carefully!) that this is indeed the changed SPL from our example set file, but you cannot edit it. The reason for this is that when we change an SPL in the example set, we can change anything we want with it, and the result might not fit into the simpler range of SPLs that SPLAT lets you build.

Fixing up lexical items

      Until now we have only fixed lexical items that were shown in the right-hand SPLAT window—i.e., new lexical items that we entered by means of SPLAT objects. But we might also have new lexical items that we now introduce directly by editing the example set file. SPLAT does not see these and so they do not appear in the SPLAT window. We can still edit these however and in just the same way. To do this we start the lexical resources editor by picking it from the Tools option in the SPLAT menu bar. This brings up the lexical item window.

      We then look to see if there are any lexical items that are approximately what we need already. Lets fix up the entry for 'shop'.

      First, we type 'shop' in at the top as the Lexical item name (do not forget to type a carriage return following the name). We then pick the 'Find' command from the menu options. This will bring up a menu containing all the lexical items whose spelling contains the string of characters "shop". If we have already generated our new example as described above, then the KPML program will already have made a guess for the lexical item for 'shop' and this will be offered to us. Selecting it leaves the lexical editor in the following state:

      So the problem is immediately clear—although the lexical item is known, KPML does not know what kind of ending type the lexical item has; this means that it does not know how to form plurals. So we can select an appropriate ending and see what difference that makes to the generation of our sentence. For example:

      Now we can regenerate the example to see if this has made a difference.

Clearing up

      Finally, the lexical items should again be saved so that they can be loaded again when starting work and any 'unwanted' objects can be cleared from the SPLAT window. Objects can be deleted by editing them as usual, but selecting 'Delete' from the Edit menu. Then the SPLAT objects can be saved as a normal collection of SPLAT objects for starting work again later. Reloading the newly saved SPLAT objects might then produce the following state of the window:

      Here we can see that all the duplications have disappeared, and that all the objects are shown in grey: this means that they have been made from example set SPLs rather than individually by hand within SPLAT. They are not editable, but they can be used to build up bigger SPLs.

Last exercise

    As a last example of using example set SPL fragments to build larger SPLs within SPLAT, you can load the following example set file ('File>Load an Example File'), which contains some more complex SPL fragments as examples:

http://www.fb10.uni-bremen.de/.../more-complicated-fragments.spl

After loading select the Example set called 'Tourism' under the 'Template' menu options. This will add two more SPLAT SPL fragments under the Objects list: one called 'one-beach-hut', the other called 'some-beaches'. Try generating with these and making the necessary lexicon additions to get them correct. Then use them in a bigger sentence, such as:

Aldeburgh has some of the longest beaches in the country.