University of Sussex
Browse

File(s) not publicly available

Detecting a continuum of compositionality in phrasal verbs

presentation
posted on 2023-06-07, 20:09 authored by Diana McCarthy, Bill Keller, John Carroll
We investigate the use of an automatically acquired thesaurus for measures designed to indicate the compositionality of candidate multiword verbs, specifically English phrasal verbs identified automatically using a robust parser. We examine various measures using the nearest neighbours of the phrasal verb, and in some cases the neighbours of the simplex counterpart and show that some of these correlate significantly with human rankings of compositionality on the test set. We also show that whilst the compositionality judgements correlate with some statistics commonly used for extracting multiwords, the relationship is not as strong as that using the automatically constructed thesaurus.

History

Publication status

  • Published

Page range

73-80

Pages

8.0

Presentation Type

  • paper

Event name

Workshop on Multi-Word Expressions: Analysis, Acquisition and Treatment (ACL 2003)

Event location

Sapporo, Japan

Event type

conference

Department affiliated with

  • Informatics Publications

Notes

Originality: Describes an original approach to determining the degree to which multi-word expressions (phrasal verbs) are compositional in meaning, based on an automatically acquired thesaurus. Proposes a continuum of compositionality. Rigour: Evaluated on a novel dataset with human judgements of compositionality showing a highly significant figure for inter-annotator agreement. Highly significant correlations were obtained between the human judgements and measures proposed in the paper. Significance: The methodology and dataset have been taken up by other researchers, though to date, several of the measures proposed have not been outperformed on this data. Other researchers have adapted the methodology to detect compositionality of other multiword constructions. Impact: 38 Google Scholar citations (not counting two cites by co-authors). The dataset has been made publicly available and several international researchers have used it in subsequent experiments. Outlet: Appeared in the first in a series of 4 workshops to date in the burgeoning field on multiword expressions. The workshop forms part of the ACL conference.

Full text available

  • No

Peer reviewed?

  • Yes

Legacy Posted Date

2012-02-06

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC