Some of the corpora described so far are already using SGML, the standardized general markup language. Also, some of their tools are supposed to be SGML-based; for example, the tool supplied to use the large British National Corpus is called "SARA", which stands for SGML-aware retrieval application. But most of these tools, although SGML-based, still rely on the particular kinds of tags and mark-up that is adopted for particular corpora.
Some online examples of SGML-marked up corpora and tools are:
The re-usability of tools for a variety of corpora and purposes depends on the emergence of standards. Currently there is great discussion and increasing accepting of a very long and complicated general coding scheme for all kinds of texts (novels, poems, plays, conversation, etc.). This coding scheme is the result of the Text Encoding Initiative. All tags and schemes that are compatible with the Text Encoding Initiative recommendations are called TEI-conformant or TEI-conforming. A quite detailed description (with examples) of a simple form of the TEI guidelines can be looked at here (note: this does not make very light reading! best thing is probably just to look at the examples at first).
Some of the more well known 'tag sets' include:
Geoffrey Sampson's Oxford University Press book English for the Computer also gives an extensive account of working out a tagset for English (the one in fact used in the Susanne corpus).