University of Sussex
Browse

File(s) not publicly available

Integrating character representations into Chinese word embedding

chapter
posted on 2023-06-09, 05:16 authored by Xingyuan Chen, Peng Jin, Diana Frances McCarthy, John Carroll
In this paper we propose a novel word representation for Chinese based on a state-of-the-art word embedding approach. Our main contribution is to integrate distributional representations of Chinese characters into the word embedding. Recent related work on European languages has demonstrated that information from inflectional morphology can reduce the problem of sparse data and improve word representations. Chinese has very little inflectional morphology, but there is potential for incorporating character-level information. Chinese characters are drawn from a fixed set – with just under four thousand in common usage – but a major problem with using characters is their ambiguity. In order to address this problem, we disambiguate the characters according to groupings in a semantic hierarchy. Coupling our character embeddings with word embeddings, we observe improved performance on the tasks of finding synonyms and rating word similarity compared to a model using word embeddings alone, especially for low frequency words.

History

Publication status

  • Published

Publisher

Springer International Publishing

Volume

10085

Page range

335-349

Pages

15.0

Book title

Chinese lexical semantics: 17th workshop, CLSW 2016, Singapore, Singapore, May 20–22, 2016, revised selected papers

ISBN

9783319495071

Series

Lecture notes in computer science

Department affiliated with

  • Informatics Publications

Research groups affiliated with

  • Data Science Research Group Publications

Full text available

  • No

Peer reviewed?

  • Yes

Editors

Jingxia Lin, Xuri Tang, Minghui Dong

Legacy Posted Date

2017-02-22

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC