Black Box Generation with KPML4

(Last update: black box capabilities provided with KPML4; updated: June 2006. Socket capability, Dec 2007; some extra options, Jan 2016)

Overview Installation Inputs Client tasks Control Command line List of options
SIMPLER VERSION

Download (English preloaded) [updated Dec 08] zipped (executable: Windows98, NT, 2000, XP): 13Mb
Download (Spanish preloaded) zipped (executable: Windows98, NT, 2000, XP): 12Mb
Download (language free; updated options: Jan 16) zipped (executable: Windows 10, Lispworks 5): 12Mb
Previous versions: 2002  

Overview and Purpose

The black box generator provides an example of a simple generator for English for Windows. It is called a black box generator because no interaction is allowed with the generator apart from the basic input-output cycle. The input consists of individual semantic specifications, and the generated string is produced as output. There is no debugging, no inspection of the generation process, no interactive inspection of generation results. It should therefore only be used with fully debugged linguistic resources, otherwise errors can arise in generation and there is no possibility of inspecting their cause. It can be used either to give an indication of generation performance from a variety of input specifications, or as a generation server for some application that only needs to produce natural language output without worrying about how it is done. The basic input-output behaviour is managed via files as described below. This means it is trivial to set up an input-output generation demonstration loop from a pair of simple text editors or text editor buffers.

Instructions for installation and basic use

The blackbox generation image should be downloaded and placed somewhere on your computer. It does not matter where. Then, when started the generator just sits and waits for a semantic input. When one comes, it generates a corresponding English sentence. It then waits for a further semantic input, and so on. The input and output is managed via files. Whenever the specified input file contains a semantic specification, it is generated. This input (and the file) is then discarded (i.e., the file is removed). The generated output is written to another file. Each newly generated sentence replaces the previous sentence in the output file. In order to stop the entire process, the single item ":stop" must be placed in the input file.

Below we will see that the behaviour of the black box generator can be controlled more finely by providing other special commands in the input file; this makes it almost usable for real generation in a number of simple contexts. If none of these control possibilities are used, then the default behaviour is that generation is in English. Moreover:

the default settings contained in the black box generator are that both the input file and the output file reside in the folder C:\tmp. The input file is called "spls-in.lisp"; the output file is called "kpml-out.txt". (For information on how to change this, see below.)

Finally, if you are using the Spanish blackbox rather than the English one, please make sure that you only create the SPL input file after starting the blackbox first; this is a minor bug already corrected in the English version. If you do not do this, then the first SPL will not be generated.

Semantic input

The semantic specifications that must be placed in the input file are bare Sentence Plan Language expressions (SPLs). This is the usual form of semantic input used by KPML. Numerous examples of SPL for a variety of languages are given in the example sets of the generation bank (but see here for the English examples).

Note: the generation bank contains examples, the input for the blackbox generator is not in the form of examples (which would not in any case be sensible since examples are prestored examples for debugging and maintanance of grammars) but in the form of the bare SPL--this is the information contained as the :logicalform slot in the example definitions. An example of a simple SPL expression is:

(s / sail :actor (p / ship :determiner some :number plural) :tense past)

which, if placed in the SPL input file, would make the black box generator produce the string "Some ships sailed." in the output file.

When the blackbox is switched into HTML or XML mode (by giving an :html or :xml command, see below), then input SPLs may contain annotations involving markup. This markup is then produced in the places in the surface string corresponding to the indicated semantic entity. For example, an SPL input of the form:

(s / sail :actor (p / ship 
    :determiner some :number plural) :tense past
    :temporal-locating 
          (m / day :name Monday 
                   :annotation (:xml emph :style date))
)

will produce something like the string:

Some ships sailed on <EMPH STYLE="DATE">Monday</EMPH> .
To achieve XML tags and attributes with lower case or namespace specifications, it is necessary to use the Lisp escape mechanism for symbols; this encloses the tag or attribute name in vertical bars, e.g.
(:xml |lowercase-tag| :|complex-attribute:name| value)

Task for the client

Note that a client program using the black box generator as a generation server needs to do the following work:

  1. Create an appropriate SPL expression for the meaning to be generated
  2. Write the single SPL expression at the beginning of the SPL input file.
  3. Wait for and then read the contents of the generation output file.
  4. Either stop by writing the single symbol ":stop" (i.e., stop with a colon directly in front of it), or continue to the next SPL to be generated by returning to (1).

It is straightforward to create black box generators of this kind from KPML. The question as to what interaction with an application program makes the most sense will depend on individual applications. As an example of how black box generators of this kind can be created, the additional code used to create a simpler variant of the one that is downloadable is given here (with an extension of .txt so that it does not executed or swallowed by security systems).

More complex control of the black box generator

The introduction above is sufficient in order to generate the first SPLs and to see what kind of interaction is involved. For more serious use, slightly more control is useful. For example, we might want to update the linguistic resources with a more recently released set, or to change the language to generate with a different resource set, or to update the lexicon or domain information that is available to the generator. We have already seen that in fact two types of input can be written to the SPL input file: either an SPL expression as illustrated above, or a command (e.g., ":stop"). The full list of commands that are interpreted by the black box generator is contained in the following table; many of these are toggles, which means that they set a mode alternately ON and OFF on subsequent uses.

:configure This causes the black box generator to load a standard KPML configuration file. The places that are looked for in order to find a configuration file are the same as with the full KPML released system. The first place to be looked is the folder where the blackbox image is started from. For descriptions of other places where the configuration file can be found and what it may contain, see the main documentation on the starting page. The ability to load a configuration file in fact opens up the door to quite extensive control of the black box generators behaviour.
:domain Loads in all the domain model files associated with the currently loaded linguistic resources. Since the default and starting language is English, this command then causes any domain model files found at the appropriate place (see the configuration file descriptions in order to see how to define what the appropriate place is) to be loaded into the running black box generator. Note that in the default released blackbox, the only form accepted for domain knowledge is in the Loom knowledge representation language.
:html
(or :xml)
This toggles "html" mode. When this mode is ON html annotations in the SPL input, specified with the :annotation keyword, are processed so that the corresponding elements in the generated strings are marked up with the specified HTML-like markup. There is no check that the tags specified are legal HTML and so any tags can be specified. This can also be used therefore to generate XML or VoiceXML or similar output, although the user is then responsible for embedding the generated result into a properly formed XML specification. The default value on starting the generator is OFF. There are some differences between the processing of the :html and :xml annotations in the SPL, however. The XML annotations enforce matching closing tags and allow a more comfortable passing across of attributes and their values. The HTML annotations can pass arbitrary strings for both the opening and closing tags and can create non-matched tag pairs.
:init Tells KPML to do a general initialization of the generation process, setting up the network connectivity, resetting macro definitions and defaults, etc. Generally not necessary, but provided as a 'push-start' for difficult cases.
:language Reports what the current language is.
:lexicon Loads in all the lexicon files associated with the currently loaded linguistic resources. Since the default and starting language is English, this command then causes any lexicon files found at the appropriate place (see the configuration file descriptions in order to see how to define what the appropriate place is) to be loaded into the running black box generator.
:mono This toggles "strictly monolingual" mode. When this mode is ON, only linguistic resources that contain absolutely no language conditionalization can be loaded. In general KPML can work with both monolingual and multilingual linguistic resources--the latter are grammars which cover more than one language. Turning this mode ON makes this functionality unavailable, but increases the speed of generation slightly. The default value on starting the generator is OFF. If you use this and generation fails, it is because the resources being used are not strictly monolingual; workaround would be to back off and look for any sign of language conditionalization in the used resources.
:stop The black box generator is halted and closed down.
:text Converts to text generation mode, where all the SPL expressions given in the input file are generated (not just the first) and the generated result file then contains all of the corresponding sentences.
:verbose This toggles a running commentary on what the black box generator is doing. This can be useful for debugging, for checking that the generator is receiving the information that you thought it was, as well as generating the result that you thought. The default value on starting the generator is OFF.
:version Reports which version of KPML is running within the blackbox generator. Probably only useful when giving a bug report or comparing blackboxes to see which is most up to date.
:xml
(or :html)
see above

In addition, new linguistic resources can be loaded (or reloaded) by giving the name of those resources as a single keyword. Thus, if the input file is given the single command:

:german

then it will try and load the linguistic resources for the language variety German from the appropriate place (see the configuration file descriptions in order to see how to define what the appropriate place is). Of course, if you do not have the necessary linguistic resources loaded on your computer in the defined appropriate place, then this command will simply empty out the black box generator so that it cannot generate anything.

Control of the black box generator from the command line

The easiest way of starting the black box generator is simply by clicking twice on it once it has been downloaded. This starts it up with all of the default settings mentioned above. It is also possible, however, to give it some command line parameters, so that even its default start-up behaviour is different. This is particularly useful, for example, for changing the input and output files that the black box is going to use when looking for its input and writing its output subsequently.

To use command line parameters, you must start the black box generator from a command line--for example, from the DOS interaction window reachable under your Start menu. The parameters are then written following the black box generator's image name. Thus, if you have saved the black box generator with its default name--kpml4-bbox.exe--then it can be started in strictly monolingual and verbose tracing mode by issuing a command such as:

kpml4-bbox.exe :mono :verbose

The default input and output files can be changed by using the keywords :in and :out. Thus the following command line call will start up the black box generator as usual, but it will look for the input SPLs in the file "E:\my-input\spls.lisp" and place the outputs in the file "A:\floppy-output.txt".

kpml4-bbox.exe :in E:\my-input\spls.lisp :out A:\floppy-output.txt

Particularly useful, if you are developing and adding in new linguistic resources, such as further lexical items, is to reconfigure the blackbox so that it expects the linguistic resources to be in a place that you specify. This can be done loading a specified configuration file in which the *root-of-resources* line is given pointing to the application-specific resources. This must be a folder structured in usual way that KPML expects for linguistic resources, e.g., LANGUAGE\Lexicons\... and so on.

Finally,

kpml4-bbox.exe :with-lexicon C:\my-big-lexicon.lexicon :port 10200

starts up the blackbox, pre-loads the user-defined lexicon additions in the specified file, and finally changes to a socket-based interaction across port 10200 on the machine where the black box is running. It is up to the user to provide a client that communicates with the blackbox across this socket.

Full list of the command line options

Remember that all of these command lines have to be written on a single line. The command line parameters can be given in any order. The full list is given in the following table.

:configure Requires that the black box generator load a configuration file before doing anything else; if it cannot find a configuration file, however, this command is ignored.
:confile This causes a configuration file to be loaded as with configure but the path immediate following :confile on the command line is used instead as the configuration file. This is useful if one want to keep normal KPML configuration files around as well as the particular configuration file for the blackbox generator. The most typical use would be to specify an alternative root folder for linguistic resources, enabling extensions to the domain and lexicon to be added (with the following commands) without changing standard resource definitions.
:first Takes the following symbol as the first command to be performed by the black box generator (see below for examples).
:html
(or :xml)
This toggles "html" or "xml" mode. When this mode is ON html annotations in the SPL input, specified with the :annotation keyword, are processed so that the corresponding elements in the generated strings are marked up with the specified HTML-like markup. There is no check that the tags specified are legal HTML and so any tags can be specified. This can also be used therefore to generate XML or VoiceXML or similar output, although the user is then responsible for embedding the generated result into a properly formed XML specification. The default value on starting the generator is OFF. See comments above for XML/HTML annotation.
:in Takes the following information as the name of the file to be used for semantic input and processing commands.
:load-language Takes the following as a name of a language variety and loads it ...
:mono This turns on "strictly monolingual" mode. When this mode is ON, only linguistic resources that contain absolutely no language conditionalization can be loaded. In general KPML can work with both monolingual and multilingual linguistic resources--the latter are grammars which cover more than one language. Turning this mode ON makes this functionality unavailable, but increases the speed of generation slightly.
:out Takes the following information as the name of the file to be used for the generated output.
:poling Takes the following information as an integer indicating how often the black box generator is going to look to see if there is a semantic specification for it to generate. The lowest value is 1, corresponding approximately to once a second.
:port

Takes the following information as an integer identifying a port to be used for socket communication. All generation subsequently runs over a socket at that port rather than using the file-based input-output cycle. To close the server across the port send the string ":stop". Otherwise, send SPLs as usual. Currently this optoin does not support further option setting after start-up.The option is also not available for the Spanish preloaded blackbox downloadable above, only for the English.

:resource-root Takes the following as the path to the folder where the language resources can be found for subsequent loading. Saves having to reset this explicitly within a configuration file and loading it in.
:text Converts to text generation mode, where all the SPL expressions given in the input file are generated (not just the first) and the generated result file then contains all of the corresponding sentences.
:verbose

This turns on a running commentary on what the black box generator is doing. This can be useful for debugging, for checking that the generator is receiving the information that you thought it was, as well as generating the result that you thought.

:with-domain Takes the file name (pathname) given as value for this attribute and loads this in immediately as a domain model for subsequent generation. Note that this pathname should be an explicit pathname uniquely identifying the domain model on the machine where you are generating, it is not relative to any linguistic resources (cf. the :domain option when generating: which operates with respect to a given linguistic resource only).
:with-lexicon Takes the file name (pathname) given as value for this attribute and loads this in immediately as a lexicon (or other linguistic resource) file for subsequent generation. Note that this pathname should be an explicit pathname uniquely identifying the domain model on the machine where you are generating, it is not relative to any linguistic resources (cf. the :domain option when generating: which operates with respect to a given linguistic resource only).
:xml
(or :html)
This toggles "html" or "xml" mode. When this mode is ON html annotations in the SPL input, specified with the :annotation keyword, are processed so that the corresponding elements in the generated strings are marked up with the specified HTML-like markup. There is no check that the tags specified are legal HTML and so any tags can be specified. This can also be used therefore to generate XML or VoiceXML or similar output, although the user is then responsible for embedding the generated result into a properly formed XML specification. The default value on starting the generator is OFF. See comments above for XML/HTML annotation.

The :first parameter is particularly useful for setting up the generator locally so that it is ready to generate with locally prepared resources. For example, the following command line call (which could then of couse itself be placed in a bat file for instant execution), says that input semantic specifications are to be taken from the file "C:\input.lisp", output is to be generated in the file "C:\out.txt", and that before any of that happens the complete locally available linguistic resources for the language variety Greek are to be loaded. Furthermore, since the black box generator cannot know where the Greek resources will have been placed locally, it additionally reads the configuration file beforehand so as to know where local resources are stored. (This means that the configuration file should contain at least a line beginning "*root-of-resources*" that gives this information).

kpml4-bbox.exe :in C:\input.lisp :out C:\out.txt :configure :first :greek

Alternatively, the following command line leaves the input and output files as their default values, and instead before generating anything just loads a locally available lexicon, presumably providing an extension of the default English lexicon that comes with the black box generator when downloaded.

kpml4-bbox.exe :first :lexicon

The following does the same, but allows you to specify (via the configuration file) exactly where on your installation the linguistic resources (including the lexicon) are located.

kpml4-bbox.exe :configure :first :lexicon

Whereas the above requires that the normal KPML configuration file is in one of the regular spots that KPML looks for such files (see the relevant documentation), the following allows you to specify an arbitrary file to use as the configuration file

kpml4-bbox.exe :confile "C:\tmp\black-box-configuration" :lexicon

And the following allows control of where grammars are to be found and which one is to be loaded all from the command line. Starting the following with the empty blackbox would therefore have the same effect as that of the preloaded Spanish blackbox, as long as the folder given is actually where the resources are to be found of course.

kpml4-bbox-2016.exe :resource-root C:\Systems\KPML\Resources\ :first :spanish

 

Last update: 8th January 2016

Back to KPML main page