This directory contains files for Technical Report CSRI-399. The files in this directory are the following: 1) README (3 KB - the ASCII file you are now reading) 2) tr-399.ps (1656 KB - PostScript) 3) tr-399.ps.gz (479 KB - PostScript compressed with the program "gzip") If you have the UNIX "gunzip" program, get the file tr-399.ps.gz. Remember to transfer the file in binary mode. After the transfer, "gunzip" the file. If you do not have the UNIX "gunzip" program, get the file TR-399.ps in ASCII mode. After transfering the file, print it on a PostScript printer. If you have any questions or comments about this technical report, please contact pedmonds@cs.toronto.edu or gh@cs.toronto.edu --------------------------------------------------------------------------- SEMANTIC REPRESENTATIONS OF NEAR-SYNONYMS FOR AUTOMATIC LEXICAL CHOICE by Philip Edmonds ABSTRACT We develop a new computational model for representing the fine-grained meanings of near-synonyms and the differences between them. We also develop a sophisticated lexical-choice process that can decide which of several near-synonyms is most appropriate in any particular context. This research has direct applications in machine translation and text generation, and also in intelligent electronic dictionaries and automated style-checking and document editing. We first identify the problems of representing near-synonyms in a computational lexicon and show that no previous model adequately accounts for near-synonymy. We then propose a preliminary theory to account for near-synonymy in which the meaning of a word arises out of a context-dependent combination of a context-independent core meaning and a set of explicit differences to its near-synonyms. That is, near-synonyms cluster together. After considering a statistical model and its weaknesses, we develop a clustered model of lexical knowledge, based on the conventional ontological model. The model cuts off the ontology at a coarse grain, thus avoiding an awkward proliferation of language-dependent concepts in the ontology, and groups near-synonyms into subconceptual clusters that are linked to the ontology. A cluster acts as a formal usage note that differentiates near-synonyms in terms of fine-grained aspects of denotation, implication, expressed attitude, and style. The model is general enough to account for other types of variation, for instance, in collocational behaviour. We formalize various criteria for lexical choice as preferences to express certain concepts with varying indirectness, to express attitudes, and to establish certain styles. The lexical-choice process chooses the near-synonym that best satisfies the most preferences. The process uses an approximate-matching algorithm that determines how well the set of lexical distinctions of each near-synonym in a cluster matches a set of input preferences. We implemented the lexical-choice process in a prototype sentence-planning system. We evaluate the system to show that it can make the appropriate word choices when given a set of preferences.