Computer Tools and Applications - Sommersemester 2000 - Anglistik

Natural Language Generation: Intro

Course Home Page

 

In this part of the course we move to a very different area of interaction between computers and the study of language: that of Natural Language Generation. Natural language generation, or NLG for short, has a long history of trying to understand better how texts are structured and how that information can be expressed in a form that can be represented as a computer program. A range of early NLG programs tried to produce texts in such areas as:

It became clear that all the traditional areas of language study have a role to play here. Not only must the NLG program be able to produce a wide range of grammatical sentences, but the texts as a whole must be well structured and their content must be chosen appropriately for the goals that the text is to achieve. For this reason, NLG typically starts from some notion of communicative goals. Given such goals (again represented in some computational form) the program must work out what content to select, how to organise that content into text, and then to divide up the content into a form that can be expressed in a sequence of sentences of some language.

Generation therefore divided into two main parts:

often these tasks are performed by two separate modules, or components, of a complete NLG system.

We will take a brief look at both of these components, starting from some working examples. Early NLG programs were more curiosities, it was fun to see what computers could be made to say; then NLG programs became more serious attempts to understand why texts are structured as they are, and how theories of text organisation and grammar could be made precise; most recently, there have been attempts to use NLG programs for practical applications: making available information that would otherwise only be represented in some difficult to understand computational form, or making information available in a variety of forms so that it is appropriate for different kinds of readers. Some of these latter programs are available across the web, so we take a look at these first.

On-line Natural Language Generation on the Web

The overheads that I use in the intro Lab session are here, more can be found below.

There is an introductory chapter concerning NLG in the envelope for this course on my door (GW2:A3074). Dr. Ehud Reiter of the University of Aberdeen introduces NLG as follows:

"As more and more people in today's service-oriented and bureaucratised world spend more and more of their time on writing documents, there is a real need for automation in the document-generation process. Otherwise, 21st-century knowledge workers may find that they are only spending 10% of their time on using their knowledge to help people, and the other 90% on writing documents that explain their reasoning and findings. Even today, it is not unusual for a weather forecaster to spend more time writing up individual forecasts than actually thinking about the weather; or for a software developer to spend more time on writing documentation and reports than on actually developing software. Also, as expert systems and other AI (Artificial Intelligence) programs become more complex, they increasingly need better ways of communicating their reasoning and findings to human users.

Natural-language generation (NLG) systems use artificial intelligence techniques to automatically generate documents, reports, and other kinds of texts in English, French, Japanese, and other human languages. NLG is a very young field, but fielded systems are already starting to be used in areas such as weather reporting. Because the field is so new, an interested student can easily find both interesting research problems and useful application areas."

Ehud Reiter and Robert Dale have just produced a book introducing many of the notions of NLG: but it not yet available in Europe. You can, however, download a very detailed and long set of overheads going over many of the issues here. These overheads are in Microsoft Powerpoint format; I will use some of them during the lab sessions.

Course Home Page