Basic XML Introduction Tutorial: Step 1 2 3 4


Validating the XML file

Getting a well-formed XML file is just the start. It does not really tell us very much apart from the fact that it is now ready to be processed by proper XML tools. The real work of making a proper XML file is to ensure that the file is not only well-formed but also valid.

Validation is the process of making sure that an XML file is not only syntactically correct—i.e., that is has tags with proper opening and closing brackets, etc.—but that the XML file also has the proper structure. The proper structure is something that we, as users, ourselves define as something appropriate for our data.

In this case, we want to make sure that we have the right sequence and embeddings of text, s, cl and vb tags as we set out in step 1. If we do not, this means that, for whatever reason, our syntactic analysis is not correct according to our own definitions and so should be fixed up.

Defining required structure: the document type Definition (DTD)

In order to tell an XML editor what kind of structure our analyses are meant to have, we have to provide Document Type Definitions (DTDs) or Document Schemas. We will look at the business of writing DTDs and schemas later on, for the moment, we simply want to see if we can validate the XML files that we have produced.

For this part of the task, a DTD which defines the structure that we set out as required in step 1 is already provided. You can download it here. You should right click on this and save it to the same folder that you have your XML file in.

Then you need to associate it with your XML file so that the XML editor knows that this DTD belongs with the file. To do this, you have to add another special line to your XML file. This must then be the second line of the file, immediately following the line beginning "<?XML..." You can add this in your text editor again. The line required is as follows:

<!DOCTYPE text SYSTEM "basic-syntax-0.dtd">

The line should all be on one single line by itself, without any breaks in it. It must also be exactly like this. The name in quotation marks is the name of the DTD that you saved above. Again, an example of the very simple XML file that we provided as a hint in step 1 above with a DTD can be seen here if you really need to look.

Loading a file with a specified dTD

Now, with the modified file, when you try and load this into the XML editor it will also check to see whether the structure of the document is compatible with the structure given in the DTD. This is much more powerful than simply checking whether the file is syntactically correct. Now it will tell us whether our verbal groups are properly positioned within clauses, and whether the clauses are properly positioned within sentences, etc. If you have made any mistake here, even though the file is proper XML, it will be labelled at this stage as incorrect.

Be sure that you have the DTD file and the XML file in the same folder. If you get this wrong, then the XML editor will not be able to find the DTD file that it has been told to look for. You will see this reported in an error message like the following:

All you need to do then is to put the DTD file in the proper place and try again.

Errors caused by not respecting the dtd Consider the following example. If our XML file, for example, had forgotten to specify a sentence element and had gone straight in with a clause—something like:

<text><cl>This is a clause.></cl>

Then, although this is fine XML, we get the following error message:

This tells us, as before, that there is a problem in the fourth line, but goes further and says that the structure of the file, i.e., of the tags that are given, does not match up with what the DTD was saying. The DTD wanted to find an opening <s>, but instead there was a <cl>, so the file is not valid according to the given DTD.

FIXING problems All mismatches between the structure that you have given in the file and the DTD will be reported in this way. You need then to go back carefully to the original XML file, in your text editor, and look at it closely. You should then be able to find exactly what is wrong and fix it. Then you open the file again in the XML editor and see if you have created any further errors. You repeat this until the file is accepted without an error message being shown.
Successfully loaded? When your file now loads without problems into the XML Notepad, then it will be showing you something like the following window.

This means that the XML is now validated. Not only do we have a proper piece of XML, but we also know that our syntactic analyses have exactly the form that we set out as required at the beginning. Since we can now guarantee this, it becomes much easier afterwards to work with the data. There will be no surprises about its structure.

This step is usually the first necessary one that we need to get beyond to start working with data and asking research questions of that data.

Next step All done! Well done...