Lexical semantics and knowledge representation in multilingual sentence generation Manfred Stede Doctoral dissertation, Department of Computer Science, University of Toronto Technical report CSRI-347 May 1996 ABSTRACT This thesis develops a new approach to automatic language generation that focuses on the need to produce a range of different paraphrases from the same input representation. One novelty of the system is its solidly grounding representations of word meaning in a background knowledge base, which enables the production of paraphrases stemming from certain inferences, rather than from purely lexical relationships alone. The system is designed in such a way that the paraphrasing mechanism extends naturally to a multilingual generator; specifically, we will be concerned with producing English and German sentences. The focus of the system is on lexical paraphrases, and one of the contributions of the thesis is in identifying, analyzing and extending relevant linguistic research so that it can be used to handle the problems of lexical semantics in a language generation system. The lexical entries are more complex than in previous generators, and they separate the various aspects of word meaning, so that different ways of paraphrasing can be systematically related to the different motivations for saying a sentence in a particular way. One result of accounting for lexical semantics in this fashion is a formalization of a number of verb alternations, for which a generative treatment is given. While the actual choice of one paraphrase as the best-suited utterance in a given situation is not a focal point of the thesis, two dimensions of preferring a variant of a sentence are discussed: that of assigning salience to the different elements of the sentence, and that of connotational or stylistic features of the utterance. These dimensions are integrated into the system, and it can thus determine a preferred paraphrase from a set of alternatives. To demonstrate the feasibility of the approach, the proposed generation architecture has been implemented as a protoype, along with a domain model that serves as the background knowledge base for specifying the input to the generator. A range of generated examples is presented to show the functionality of the system.