There are an increasing number of ready-prepared corpora with more or less information contained in mark-up in order to make searching and investigation easier. Some of these are very extensive and are accordingly quite expensive. They also often have their own systems of mark-up, as they have taken many years to produce and standards for mark-up are still being developed. Some of the largest corpora (with homepage pointers) available are:
We will take a closer look at the Susanne Corpus and the International Corpus of English for Great Britain: for the latter again working with a freely available sample version of the corpus and the tool that is provided for it. But there are also some things that you can do directly across the web: for example, the restricted concordancer for the Bank of English: CobuildDirect Corpus Sampler. Here you can already see some of the value of having some mark-up, since part-of-speech tags can be used directly in the query; a full list of tags and information about query syntax is given here.
More information relevant here may be found on the Linguist
List general information page:
Definitive
Resource of Organizations, Programs and Centers in Linguistics
Happy hunting!