It is noteworthy and problematic that there are no established evaluation criteria for generation systems other than does a component work in some particular context of application--and it is often problematic to establish this definitively. Attempts to propose formal requirements for NLG components have generally been limited to accounts of syntax. However, the criteria appearing here (e.g., nondirectionality, declarativity, guaranteed termination; cf. [van Noord and Neumann: 1997]) appear at the present time to rate more highly systems that have undesirable development and application behaviors (e.g., longer processing times, limited syntactic coverage, lack of text type variability, poor control and realization of textual development) while rating less well systems that offer broader bases both for applications and for crucial NLG research in areas such as content selection, text planning, and text cohesion. Proposals for systematic evaluation of grammatical and other resource coverage [Bateman and Hartley: 2000,Robin: 1996], of overall system performance [Lester and Porter: 1997,Callaway and Lester: 2001], of applicability of systems to particular contexts [Coch: 1996,Pianesi, Pianta and Tovena: 1999] all need substantially more work. Experience here from natural language understanding, parsing and dialog systems will be invaluable for this, as well as continuing attempts at standardization.