Semi-supervised training of a statistical parser from unlabeled partially-bracketed data

Watson, Rebecca, Briscoe, Ted and Carroll, John (2007) Semi-supervised training of a statistical parser from unlabeled partially-bracketed data. In: Tenth International Conference on Parsing Technologies, Prague, Czech Republic.

Full text not available from this repository.

Abstract

We compare the accuracy of a statistical parse ranking model trained from a fully-annotated portion of the Susanne treebank with one trained from unlabeled partially-bracketed sentences derived from this treebank and from the Penn Treebank. We demonstrate that confidence-based semi-supervised techniques similar to self-training outperform expectation maximization when both are constrained by partial bracketing. Both methods based on partially-bracketed training data outperform the fully supervised technique, and both can, in principle, be applied to any statistical parser whose output is consistent with such partial-bracketing. We also explore tuning the model to a different domain and the effect of in-domain data in the semi-supervised training processes.

Item Type: Conference or Workshop Item (Paper)
Schools and Departments: School of Engineering and Informatics > Informatics
Depositing User: John Carroll
Date Deposited: 06 Feb 2012 20:43
Last Modified: 13 Apr 2012 08:43
URI: http://sro.sussex.ac.uk/id/eprint/27753
📧 Request an update