ASOBEK: Twitter paraphrase identification with simple overlap features and SVMs

Eyecioglu, Asli and Keller, Bill (2015) ASOBEK: Twitter paraphrase identification with simple overlap features and SVMs. In: SemEval-2015: The 9th International Workshop on Semantic Evaluation: proceedings of SemEval-2015: June 4-5, 2016, Denver, Colorado, USA. Association for Computational Linguistics (ACL), Stroudsburg, PA, pp. 64-69. ISBN 9781941643402

[img] PDF - Published Version
Restricted to SRO admin only

Download (297kB)

Abstract

We present an approach to identifying Twitter paraphrases using simple lexical over-lap features. The work is part of ongoing re-search into the applicability of knowledge-lean techniques to paraphrase identification. We utilize features based on overlap of word and character n-grams and train support vector machine (SVM). Our results demonstrate that character and word level overlap features in combination can give performance comparable to methods employing more sophisticated NLP processing tools and external resources. We achieve the highest F-score for identifying paraphrases on the Twitter Paraphrase Corpus as part of the SemEval-2015 Task1.

Item Type: Book Section
Schools and Departments: School of Engineering and Informatics > Informatics
Subjects: P Language and Literature > P Philology. Linguistics > P0098 Computational linguistics. Natural language processing
Q Science > QA Mathematics > QA0075 Electronic computers. Computer science
Depositing User: Bill Keller
Date Deposited: 23 Jun 2016 10:54
Last Modified: 23 Jun 2016 12:48
URI: http://sro.sussex.ac.uk/id/eprint/61685

View download statistics for this item

📧 Request an update