Class-Based Probability Estimation Using a Semantic Hierarchy

Clark, Stephen and Weir, David (2002) Class-Based Probability Estimation Using a Semantic Hierarchy. Computational Linguistics, 28 (2). pp. 187-206. ISSN 0891-2017

Full text not available from this repository.


This article concerns the estimation of a particular kind of probability, namely, the probability of a noun sense appearing as a particular argument of a predicate. In order to overcome the accompanying sparse-data problem, the proposal here is to define the probabilities in terms of senses from a semantic hierarchy and exploit the fact that the senses can be grouped into classes consisting of semantically similar senses. There is a particular focus on the problem of how to determine a suitable class for a given sense, or, alternatively, how to determine a suitable level of generalization in the hierarchy. A procedure is developed that uses a chi-square test to determine a suitable level of generalization. In order to test the performance of the estimation method, a pseudo-disambiguation task is used, together with two alternative estimation methods. Each method uses a different generalization procedure; the first alternative uses the minimum description length principle, and the second uses Resnik's measure of selectional preference. In addition, the performance of our method is investigated using both the standard Pearson chi-square statistic and the log-likelihood chi-square statistic.

Item Type: Article
Additional Information: Originality: The paper presents a new method for establishing the selectional preferences of a verb: a central problem in computational linguistics. Rigour: The method presented is precisely formalised and a thorough experimental evaluation is presented. Significance: The technique of finding the right level of generalisation in WordNet has been used by others for a few tasks (mainly related to acquiring selectional preferences) and has not yet been clearly out-performed by other methods (see, for example, Brockmann and Lapata (2003)). Impact: Total citations in Google Scholar for this paper and its preceding conference paper are 59.
Schools and Departments: School of Engineering and Informatics > Informatics
Depositing User: David Weir
Date Deposited: 06 Feb 2012 21:22
Last Modified: 28 Mar 2012 13:14
📧 Request an update