Computer Tools and Applications - Sommersemester 2000 - Anglistik

Working with 'corpora' with specialized markup

Course Home Page

 

There are an increasing number of ready-prepared corpora with more or less information contained in mark-up in order to make searching and investigation easier. Some of these are very extensive and are accordingly quite expensive. They also often have their own systems of mark-up, as they have taken many years to produce and standards for mark-up are still being developed. Some of the largest corpora (with homepage pointers) available are:

We will take a closer look at the Susanne Corpus and the International Corpus of English for Great Britain: for the latter again working with a freely available sample version of the corpus and the tool that is provided for it. But there are also some things that you can do directly across the web: for example, the restricted concordancer for the Bank of English: CobuildDirect Corpus Sampler. Here you can already see some of the value of having some mark-up, since part-of-speech tags can be used directly in the query; a full list of tags and information about query syntax is given here.

More information relevant here may be found on the Linguist List general information page:
Definitive Resource of Organizations, Programs and Centers in Linguistics

Happy hunting!

Step-by-step instructions for a lab session for the Susanne Corpus

Step-by-step instructions for a lab session for the International Corpus of English

Course Home Page