Unsupervised induction of Arabic root and pattern lexicons using machine learning

Khaliq, Bilal and Carroll, John (2013) Unsupervised induction of Arabic root and pattern lexicons using machine learning. In: International conference recent advances in natural language processing (RANLP), 7-13 September 2013, Hissar, Bulgaria.

This is the latest version of this item.

[img]
Preview
PDF
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (235kB) | Preview

Abstract

We describe an approach to building a morphological analyser of Arabic by inducing a lexicon of root and pattern templates from an unannotated corpus. Using maximum entropy modelling, we capture orthographic features from surface words, and cluster the words based on the similarity of their possible roots or patterns. From these clusters, we extract root and pattern lexicons, which allows us to morphologically analyse words. Further enhancements are applied, adjusting for morpheme length and structure. Final root extraction accuracy of 87.2% is achieved. In contrast to previous work on unsupervised learning of Arabic morphology, our approach is applicable to naturally-written, unvowelled Arabic text.

Item Type: Conference or Workshop Item (Paper)
Schools and Departments: School of Engineering and Informatics > Informatics
Subjects: Q Science > QA Mathematics > QA0075 Electronic computers. Computer science
Depositing User: John Carroll
Date Deposited: 24 Apr 2015 08:41
Last Modified: 24 Apr 2015 08:41
URI: http://sro.sussex.ac.uk/id/eprint/53738

Available Versions of this Item

View download statistics for this item

📧 Request an update