R13-1045.pdf (229.68 kB)
Unsupervised induction of Arabic root and pattern lexicons using machine learning
presentation
posted on 2023-06-08, 20:35 authored by Bilal Khaliq, John CarrollWe describe an approach to building a morphological analyser of Arabic by inducing a lexicon of root and pattern templates from an unannotated corpus. Using maximum entropy modelling, we capture orthographic features from surface words, and cluster the words based on the similarity of their possible roots or patterns. From these clusters, we extract root and pattern lexicons, which allows us to morphologically analyse words. Further enhancements are applied, adjusting for morpheme length and structure. Final root extraction accuracy of 87.2% is achieved. In contrast to previous work on unsupervised learning of Arabic morphology, our approach is applicable to naturally-written, unvowelled Arabic text.
History
Publication status
- Published
Publisher URL
Page range
350-356Presentation Type
- paper
Event name
International conference recent advances in natural language processing (RANLP)Event location
Hissar, BulgariaEvent type
conferenceEvent date
7-13 September 2013Department affiliated with
- Informatics Publications
Full text available
- Yes
Peer reviewed?
- Yes
Legacy Posted Date
2015-04-24Usage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC