The GeM Corpus

The annotated corpus was the centerpiece of the GeM project. The annotation scheme is documented and illustrated below. It is on the basis of the annotated corpus that the project attempted to provide empirically motivated accounts of the relation between multimodal genres and their realization in form. The work is continuing in several other projects and concerning different genres.

The corpus is represented as a set of XML files. Standard XML tools are required if you want to examine the corpus or work with it. The DTDs and definitions of the annotations are given below.