Embed More Ignore Less (EMIL): enriched representations for Arabic NLP

Younes, Ahmed and Weeds, Julie (2020) Embed More Ignore Less (EMIL): enriched representations for Arabic NLP. The Fifth Arabic Natural Language Processing Workshop (WANLP 2020), Online, 12th December 2020. Published in: Proceedings of the Fifth Arabic Natural Language Processing Workshop. 139-154. Association for Computational Linguistics

[img] PDF - Accepted Version
Restricted to SRO admin only
Available under License Creative Commons Attribution.

Download (1MB)
[img] PDF - Published Version
Available under License Creative Commons Attribution.

Download (1MB)

Abstract

Our research focuses on the potential improvements of exploiting language specific characteristics in the form of embeddings by neural networks. More specifically, we investigate the capability of neural techniques and embeddings to represent language specific characteristics in two sequence labeling tasks: named entity recognition (NER) and part of speech (POS) tagging. In both tasks, our preprocessing is designed to use enriched Arabic representation by adding diacritics to undiacritized text. In POS tagging, we test the ability of a neural model to capture syntactic characteristics encoded within these diacritics by incorporating an embedding layer for diacritics alongside embedding layers for words and characters. In NER, our architecture incorporates diacritic and POS embeddings alongside word and character embeddings. Our experiments are conducted on 7 datasets (4 NER and 3 POS). We show that embedding the information that is encoded in automatically acquired Arabic diacritics improves the performance across all datasets on both tasks. Embedding the information in automatically assigned POS tags further improves performance on the NER task.

Item Type: Conference Proceedings
Keywords: Arabic NLP, Named Entity Recognition, Sequence Labelling, Automatic Diacriticisation, Embeddings
Schools and Departments: School of Engineering and Informatics > Informatics
Related URLs:
SWORD Depositor: Mx Elements Account
Depositing User: Mx Elements Account
Date Deposited: 02 Dec 2020 11:38
Last Modified: 08 Jan 2021 14:58
URI: http://sro.sussex.ac.uk/id/eprint/95439

View download statistics for this item

📧 Request an update