Domain-specific sense distributions and predominant sense acquisition

Koeling, Rob; McCarthy, Diana; Carroll, John

File(s) not publicly available

Domain-specific sense distributions and predominant sense acquisition

presentation

posted on 2023-06-08, 09:20 authored by Rob Koeling, Diana McCarthy, John Carroll

Distributions of the senses of words are often highly skewed. This fact is exploited by word sense disambiguation (WSD) systems which back off to the predominant sense of a word when contextual clues are not strong enough. The domain of a document has a strong influence on the sense distribution of words, but it is not feasible to produce large manually annotated corpora for every domain of interest. In this paper we describe the construction of three sense annotated corpora in different domains for a sample of English words. We apply an existing method for acquiring predominant sense information automatically from raw text, and for our sample demonstrate that (1) acquiring such information automatically from a mixed-domain corpus is more accurate than deriving it from SemCor, and (2) acquiring it automatically from text in the same domain as the target domain performs best by a large margin. We also show that for an all words WSD task this automatic method is best focussed on words that are salient to the domain, and on words with a different acquired predominant sense in that domain compared to that acquired from a balanced corpus.

History

Publication status

Published

External DOI

https://doi.org/10.3115/1220575.1220628

Pages

8.0

Presentation Type

paper

Event name

Joint Human Language Technology and Empirical Methods in Natural Language Processing Conferences

Event location

Vancouver, Canada

Event type

conference

ISBN

1-932432-55-8

Department affiliated with

Informatics Publications

Notes

Association for Computational Linguistics

Full text available

No

Peer reviewed?

Yes

Legacy Posted Date

2012-02-06

Usage metrics

Keywords

Uncategorised value

Licence

Copyright not evaluated

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) not publicly available

Domain-specific sense distributions and predominant sense acquisition

History

Publication status

External DOI

Pages

Presentation Type

Event name

Event location

Event type

ISBN

Department affiliated with

Notes

Full text available

Peer reviewed?

Legacy Posted Date

Usage metrics

Categories

Keywords

Licence

Exports