USING AND KEEPING LINGUISTIC DATA

WinterSemester 2004/5

Prof. Dr. John A. Bateman

Course Schedule

 

The folder for this course in the copy shop is number 87.

We wind up the course with a few quick glances to the future of XML applications...

For some of the presentations, topics and annotations developed, see here.

 

21.10.2004

Corpora and annotation - I

Reading: Leech & Fligelstone (1992); Biber, Conrad & Reppen (1998, Chap. 1)

28.10

Corpus and annotation – II

Activity: Constructing an XML file and checking validity, using XML editors, etc.
Reading: McEnery & Wilson (2001, Chap. 2)

4.11

Deciding on a topic for investigation

11.11

Annotation schemes for particular purposes

Reading: Selected online XML documentation (DTD examples)
Activity: Designing an annotation scheme

18.11

Presentation of annotation schemes and data - I

Activity: Designing an annotation scheme
(Materials for using DTDs for transcriptions)

25.11

Presentation of annotation schemes and data - II

Activity: Design of your own transcription system and specification as a DTD: first approximation.
You should send the resulting DTD to me by email by Monday 29th. November!!!

2.12

Tools for getting information out again: Xpath


Activity:
Working with XML files and writing Xpath specifications to retrieve information. (Introductory tutorial for Xpath)

9.12

Advanced topics in annotation: XSLT

Activity:Introduction to XSLT for presenting results

16.12

Advanced topics in annotation: stand-off annotation and automatic annotation

Activity: (i) Illustration of stand-off annotation (powerpoint); (ii) online stemmers, taggers and parsers (exercises and materials)
Reading: Thompson & McElvie (1997: xml + dtd); Durusau and O'Donnell (2001: local cache: powerpoint) paper presented at Extreme Markup Languages 2001; selected sections from the CES documentation

Christmas break
Activity: data collection and annotation

6.1.2005

Progress reports and trouble shooting.

Activity: Working with multiple documents: turning a set of XML documents into a database of linguistic results; using XSLT transform programs for creating webpages, plain text and other documents

13.1

Collecting information together: XML databases

Activity: XSLT tricks and methods: variables

20.1

Using the technology for presenting results: using XSLT transformations and  style sheets as access to databases

Activity: XSLT tricks and methods: variables for storing node sets; grouping information according to attributes rather than according to files.

27.1

Review of and comparison with traditional tools for linguistic data storage: converting legacy data

PROGRESS REPORTS: presentations

 

3.2

Practical working session: finding patterns

PROGRESS REPORTS

Corpus alignment: Example of practice: Multext
Materials: examples from XCES (the XML version of the corpus encoding standard) and parallel aligned corpora; gentle introduction to the Text Encoding Initiative (TEI)

11.2

Presentation of results

And review of methods and applications so far. Future developments.

Reading: selected sections from the DBXML documentation
Activity: (i) Adding XML documents into a database and using queries to retrieve information. [Example data set + Overheads] (ii) Where all is XML turning up these days... rss feeds, etc. etc.