KPML>Documentation>Input
specifications
BUILDING
AN SPL FOR GENERATION
An
introductory guide by Juan Rafael Zamorano Mansilla
May 2003
What’s an SPL?
-
An SPL is the form of non-linguistic input fed
into KPML in order to obtain a sentence in a natural language as output.
- In a more general way we can say that an SPL
is the semantic specification of a sentence. 'SPL' stands for Sentence Planning
Language, and was originally devised by Bob Kasper for the Penman text generation
system; nowadays we commonly talk of an SPL or some SPLs to
refer to the specifications themselves.
Writing an SPL
- The usual way of using an SPL when testing a
grammar or a generator is inside an example. This is the standard place
where an SPL occurs for generation with KPML. The example specification shown
below generates the sentence Men are not happy.
(EXAMPLE
:NAME
A-G1
:GENERATEDFORM
"Men are not happy."
:GLOSS
( :ENGLISH "Men are not happy.")
:TARGETFORM
"Men are not happy."
:LOGICALFORM
(L / PROPERTY-ASCRIPTION
:DOMAIN (M / PERSON :LEX man :number plural )
:RANGE (C / QUALITY :LEX happy )
:TENSE present
:POLARITY negative
:LEX be )
:SET-NAME
ADJECTIVAL-GROUP
)
- The words :NAME , :GENERATEDFORM
, :GLOSS , :TARGETFORM ,
:LOGICALFORM and :SET-NAME
are slots that contain information required by the generator. The information
that is inserted after each slot is explained in the following sections. The
SPL proper is the expression that is found under the LOGICALFORM
slot. A generation program that is itself driving generation might just produce
SPLs without worrying about example structures; if you are writing the SPLs
yourself then it is usually more advisable to wrap them up as examples so
that they have names. KPML also provides extensive tools for working with
sets of examples for testing grammars and exploring linguistic resources:
all of the information in the Grammar Generation
Bank is created from sets of examples.
:NAME
-
Here you must write the name of the sentence
("A-G1" in the example). This name can be a number or a text of some length,
but remember that no spaces can be inserted. Many sentences are usually
saved in one file set, so it is important that they have different names.
If two sentences have the same name, one of them will be discarded by the
program.
:GENERATEDFORM
-
Here is the output sentence generated by the
program. It is not obligatory to fill in this slot and leaving a space
between inverted commas (" ") will not affect the generation. If you save
an example set after generating all the sentences, the program will automatically
put the generated result in here.
:GLOSS
-
It is a translation into English of the sentence
you want to generate. Again if you leave this slot empty the generation
process will not be affected, but other users will appreciate it if you
include a translation, particularly if you are generating in a language
they can’t speak!
:TARGETFORM
-
It contains the sentence you intend to generate.
The difference with :GENERATEDFORM is that
what you include here will be used by the program to evaluate how accurate
the result is. If the generated sentence does not match what you include
in :TARGETFORM , KPML will notice and indicate
it by displaying the sentence in red (instead of green).
-
It is also important that you fill in this slot
because what you write here will be presented in the menu where the user
selects the sentence s/he wants to generate among all the other sentences
that are part of the same example set.
:LOGICALFORM
- This is where the semantics of the sentence
is. We begin by specifying what type of process we wish to generate. In the
example presented above we had the following SPL specification:
-
-
(L / PROPERTY-ASCRIPTION
:DOMAIN (M / PERSON :LEX man :number plural )
:RANGE (C / QUALITY :LEX happy )
:TENSE present
:POLARITY negative
:LEX be )
-
-
-
-
- The process type can be seen in the first line:
here we have a relational process, in which a property is ascribed to an entity.
This kind of process is known as PROPERTY-ASCRIPTION
by KPML. If you want to know what other concepts and keywords are used to
refer to the rest of process types, consult the Semantics
Guide or, once that is clear, see the very many examples maintained in
the Generation
Bank .
- We also need to give a name to this process
in case we need to refer back to it in future. In the example the process
has the name "L", but it could be something else of your choice, like "BE",
"VERB-1", etc. The name and the type of process are separated by a slash in
this way:
[NAME] / [PROCESS TYPE]
Examples
help / material-process
process-1 / verbal-process
verb-a / mental-process
- Once the process type is defined, KPML needs
some information about the participants that are involved as well as some
grammatical information about the sentence. The participants are specified
after a colon like this:
:DOMAIN (M / PERSON :LEX
man :NUMBER plural )
:RANGE (C / QUALITY :LEX happy
)
In the example above there are two participants:
DOMAIN and RANGE.
To learn what participants are inherent to each process type and how they are
referred to in KPML, consult the Semantics Guide
or the Generation
Bank .
- After each participant we usually want to give
more information about them; we include this information within their brackets.
We start by giving a name to the participant. In the example, DOMAIN
gets the name "M", whereas RANGE gets the name
"C", but you are free to choose the names you prefer. After that, we insert
a slash and then we specify semantic information about the participant.
In the example, DOMAIN is a person, so we write
PERSON . On the other hand, RANGE
is a quality, so we write QUALITY.
:DOMAIN (M / PERSON :LEX
man :NUMBER plural )
:RANGE (C / QUALITY :LEXhappy
)
- For a list of the possible choices here,
consult the Generation
Bank . But in general terms it suffices to say that participants are either
PERSON or THING.
If you want to be more precise, you can specify if the person is MALE
or FEMALE. On the other hand, if you want to
be less precise you can write CONSCIOUS-BEING
. Examples:
-
- (participant-1 / PERSON :LEX people
)
- (poli / MALE :LEX policeman )
- (a / CONSCIOUS-BEING :LEX
whale )
- THING is what you
use for the rest of entities. If you want to be more precise, you can also
write NONCONSCIOUS-OBJECT .
-
- (war / THING :LEX war )
- (M / NONCONSCIOUS-OBJECT :LEX table
)
- Qualities always receive the term QUALITY
. Circumstances, on the other hand, usually involve complex semantics,
so it is better to consult the Semantics Guide
or the Generation
Bank , where you will find examples of all the circumstance types available
in KPML.
- Once this has been done, we can include more
specific information if we want. In principle only M
/ PERSON is obligatory, but we can say what exact word will realise
the participant with the command :LEX.
:DOMAIN (M/ PERSON :LEX
man :number plural )
:RANGE (C/ QUALITY
:LEX happy )
-
-
- If we dont include :LEX, the program will
select any lexical item from its lexicon that fits the semantics of the participant
(PERSON, QUALITY, etc.) Other things such as number (:number
plural ), determination, etc can also be specified. The best way to
learn what other things you can specify is by studying the Generation
Bank , where you will find sentences exemplifying a good variety of constructions.
- Once the specifications about the participants
are finished, it is convenient to say something about the properties of the
sentence.
- :LEX is used
in the example to specify which verb will realise the process, but each
process type has a default choice associated.
- :TENSE informs
the program about the tense of the sentence, although the present tense
is the default and need not be specified.
- :POLARITY is
used to determine whether the sentence is negative or affirmative, being
affirmative the default choice.
- To learn how to express other sentence properties,
consult the Semantics Guide .
- Finally, you also may want to include circumstantials
in your sentence, such as time locatives, spatial locatives, etc. For instance,
we can change the philosophical message of the example above into something
more mundane by adding a time locative: Men are not happy on Mondays.
An example containing the appropriate SPL would be given like this:
(EXAMPLE
:NAME
A-G1
:GENERATEDFORM
"Men are not happy on Mondays."
:GLOSS
( :ENGLISH "Men are not happy on Mondays.")
:TARGETFORM
"Men are not happy on Mondays."
:LOGICALFORM
(L / PROPERTY-ASCRIPTION
:DOMAIN (M / PERSON :LEX man :number plural )
:RANGE (C / QUALITY :LEX happy )
:TEMPORAL-LOCATING (D / ONE-OR-TWO-D-TIME
:LEX monday :number plural )
:TENSE present
:POLARITY negative
:LEX be )
:SET-NAME
ADJECTIVAL-GROUP
-
)
- It's easy to see that circumstantials are specified
following the same principles as for participants. You only need to know the
keyword to introduce them (:TEMPORAL-LOCATING)
and the semantics (ONE-OR-TWO-D-TIME). You will
find a list of them along with their realizations in the Semantics
Guide .
:SET-NAME
- SPL’s are stored in files called Example Sets.
These Example Sets must have a name the program can recognise, and this is
put under :SET-NAME. In the example presented above the sentence belongs to
a set called ADJECTIVAL-GROUP. Of course you are free to name your sets as
you please.
KPML>Documentation>Input
specifications