List of documents in the GeM corpus, sources, and unanalysed original pages

There are several partners in the GeM project who agreed either to provide documents or to discuss their working methods and GeM-proposals. They include:

The Daily Telegraph
The Edinburgh International Conference Centre
The Guardian
Harper Collins
The Herald
Taylor and Francis Publishers
The Scotsman
JET Documentation Services
The Information Design Unit
Harcourt Health Sciences.

The texts of the corpus can be accessed below in their raw, unannotated and unanalysed form. The websites are extracts that follow a selected set of content themes rather than entire copies of the respective sites. Not all of the documents and pages available here have been annotated in full; for the analysed documents, see the annotated corpus pages.

The online Guardian. The corpus consists of a selection of the website for the issue .
The online Herald. The corpus consists of a selection from the website for the issue.
The online Telegraph. This is not being made available online at the present time.
Not yet online
Bird field guide pages drawn from a range of sources.
Instruction manuals drawn from a range of sources.