Basic XML Introduction Tutorial: Step 1 2 3 4


Making an XML file...

In this tutorial we will start by creating a very simple XML file and checking that it is correct. The same kind of steps are used for all XML files and so this serves as an easy introduction.

In order to work with an XML, it is best to use special XML editors. You will see this very quickly, because we will start without one. You will see that when we move to an XML editor, things get much simpler!

The task

The simple exercise that we will carry out is the following. Take the following simple text extract (taken from text N01 of the Susanne corpus):

Dan Morgan told himself he would forget Ann Turner. He was well rid of her. He certainly didn't want a wife who was as fickle as Ann. If he had married her, he'd have been asking for trouble. But all of this was rationalization. Sometimes he woke up in the middle of the night thinking of Ann, and then could not get back to sleep. His plans and dreams had revolved around her so much and for so long that now he felt as if he had nothing. The easiest thing would be to sell out to Al Budd and leave the country, but there was a stubborn streak in him that wouldn't allow it. The best antidote for the bitterness and disappointment that poisoned him was hard work. He found that if he was tired enough at night , he went to sleep simply because he was too exhausted to stay awake. Each day he found himself thinking less often of Ann; each day the hurt was a little duller , a little less poignant. He had plenty of work to do. Because the summer was unusually dry and hot, the spring produced a smaller stream than in ordinary years. The grass in the meadows came fast , now that the warm weather was here.

You can save this text to a plain text file here by right clicking and selecting a place to keep the file.

We want now to annotate this text so that it has basic sentence structure marked up. This means that we will be able to find some of the syntactic structure in the text easily. Later on we will consider how to annotate particular kinds of semantic and pragmatic information, but we will start with syntactic structure as this is fairly simple but also sufficiently structured as to raise some problems.

The task will have be successfully completed when you have annotated the plain text by turning it into an XML file that conforms to a particular structure defined for this tutorial.
The required structure

For the purposes of this exercise, we will want a structure as follows.

  • The entire text is to be tagged as a text, within <text> and </text> tags.
  • Each sentence is to be tagged as a sentence, within <s> and </s> tags.
  • Each clause is to be tagged as a clause, within <cl> and </cl> tags.
  • Clauses may have other clauses inside them.
  • Any clause may have a verbal group inside it, that is, a part of the clause that it concerned with the activity. The verbal groups are to be tagged inside <vb> and </vb> tags.

    Here is the first sentence of the text as an example:

    <s><cl>Dan Morgan <vb>told</vb> himself </cl><cl>he <vb>would forget</vb> Ann Turner.</cl></s>


    You can see that there may be places in the text where you are not sure of how to analyse the structure: that is less important for the present tutorial. Just make some sensible decision, and code that.
  • Note that verbal groups can only be inside clauses.
  • And note that clauses can only be inside sentences.
  • And, finally, that sentences can only be inside a text.

If you violate any of these rules that would mean that the analysis of the text is not following the structure set out. One of the benefits of using XML is that we can check for this kind of mistake automatically.

Method

Take your saved version of the text above and then:

  1. make a copy of it in another file
  2. choose your favourite text editor and open the copied file for editing
  3. add the necessary annotations into the text by typing directly what is required, just as indicated above in the example annotation of the first sentence.
  4. Add the following standard XML as the first line of the file: this is what tells real XML editors what kind of XML they are dealing with:
    <?xml version="1.0" encoding="UTF-8"?>
  5. save the edited file, which should now look like an XML file, as plain text with the extension ".xml"
Hint If you really have no idea what your edited file should look like, a simple example can be looked at here.
Next step Checking what you have created...