We look at three areas:
We will be assuming a PC Windows environment throughout. Things that you should be able to do to follow all the instructions in this section of the course include: downloading files from a website to a folder of your choice and unzipping downloaded zipped files so that the files extracted go to a folder of your choice. If you are not sure how to do this, ask!
For each of these areas there are different tools that are appropriate; the tools also naturally vary depending on what one wants to do with the tools.
In all cases, it is useful to bear in mind the following comment from Mike Scott, the developer of the Wordsmith tools
"Tools are needed in almost every human endeavour, from making pottery to predicting the weather. Computer tools are useful because they enable certain actions to be performed easily, and this facility means that it becomes possible to do more complex jobs. It becomes possible to gain insights because when you can try an idea out quickly and easily, you can experiment, and from experimentation comes insight. Also, re-casting a set of data in a new form enables the human being to spot patterns.
This is ironic. The computer is an awful device for recognising patterns. It is good at addition, sorting, etc. It has a memory but it does not know or understand anything, and for a computer to recognise printed characters, never mind reading hand-writing, is a major accomplishment.
Nevertheless, the computer is a good device for helping humans to spot patterns and trends. That is why it is important to see computer tools such as these in WordSmith Tools in their true light. A tool helps you to do your job, it doesn't do your job for you."(Mike Scott, The Wordsmith Manual, p12)
But while computer tools may help us do our job, they may also help us first realize what jobs are even possible. Sometimes the availability of a tool is itself an important factor in following up particular lines of questioning. Here are some questions asked in the description of the British National Corpus:
"What's the plural of corpus? In what social situations is wicked a term of approval? Why does it "sound wrong" to say The good weather set in on Thursday although The bad weather set in on Thursdayis perfectly acceptable? If I can say I live a stone's throw away from here , can I also say I'm going a stone's throw away from here?
Large language corpora can help provide answers for these kinds of questions -- if only because they encourage linguists, lexicographers, and all who work with language to ask them. The purpose of a language corpus is to provide language workers with evidence of how language is really used, evidence that can then be used to inform and substantiate individual theories about what words might or should mean. Traditional grammars and dictionaries tell us what a word ought to mean, but only experience can tell us what a word is used to mean. This is why dictionary publishers, grammar writers, language teachers, and developers of natural language processing software alike have been turning to corpus evidence as a means of extending and organizing that experience. "
For some final ideas and (decreasingly!) gentle introductions to using a corpus, you can also look at:
There is also the "definitions game" from the builders of the COBUILD grammars and dictionaries: think you know all the words? try it out!
Summary of uses that are commonly made of corpora and some example questions |