Induction of root and pattern lexicon for unsupervised morphological analysis of Arabic

Khaliq, Bilal and Carroll, John (2013) Induction of root and pattern lexicon for unsupervised morphological analysis of Arabic. In: 6th international joint conference on natural language processing (IJCNLP), 14-18 October 2013, Nagoya, Japan.

This is the latest version of this item.

[img]
Preview
PDF
Available under License Creative Commons Attribution-NonCommercial ShareAlike.

Download (247kB) | Preview

Abstract

We propose an unsupervised approach to learning non-concatenative morphology, which we apply to induce a lexicon of Arabic roots and pattern templates. The approach is based on the idea that roots and patterns may be revealed through mutually recursive scoring based on hypothesized pattern and root frequencies. After a further iterative refinement stage, morphological analysis with the induced lexicon achieves a root identification accuracy of over 94%. Our approach differs from previous work on unsupervised learning of Arabic morphology in that it is applicable to naturally-written, unvowelled text.

Item Type: Conference or Workshop Item (Paper)
Schools and Departments: School of Engineering and Informatics > Informatics
Subjects: Q Science > QA Mathematics > QA0075 Electronic computers. Computer science
Depositing User: John Carroll
Date Deposited: 24 Apr 2015 08:39
Last Modified: 24 Apr 2015 08:39
URI: http://sro.sussex.ac.uk/id/eprint/53737

Available Versions of this Item

View download statistics for this item

📧 Request an update