Improved Learning for Hidden Markov Models Using Penalized Training

Keller, Bill and Lutz, Rudi (2002) Improved Learning for Hidden Markov Models Using Penalized Training. In: AICS 02: Proceedings of the 13th Irish International Conference on Artificial Intelligence and CognitiveScience, LIMERICK, IRELAND.

Full text not available from this repository.


In this paper we investigate the performance of penalized variants of the forwards-backwards algorithm for training Hidden Markov Models. Maximum likelihood estimation of model parameters can result in over-fitting and poor generalization ability. We discuss the use of priors to compute maximum a posteriori estimates and describe a number of experiments in which models are trained under different conditions. Our results show that MAP estimation can alleviate over-fitting and help learn better parameter estimates.

Item Type: Conference or Workshop Item (Paper)
Additional Information: Originality: This was the first application within NLP of penalised training of Hidden Markov Models using Dirichlet priors over the emission probabilities of the model. Rigour: The paper derived the necessary EM update rule incorporating the Dirichlet prior, and described emiprical results comparing learning with this prior with several other priors recommended in the literature. The data consisted of the first 5000 POS tagged sentences from the BNC corpus, split into training and test sets. All results were obtained using 10-fold cross validation, and were shown to be statistically significant. Significance: The paper showed that the use of Dirichlet priors (with the Dirichlet distribution parameters set proportional to the normalised frequencies of the observation symbols in the training data) consistently enabled the learning of better performing models. This result was robust across model sizes and variations in initial conditions. Additionally, the results cast doubt on claims by Brand that minimum entropy priors gave good results, suggesting the need for further work in this area. Since this paper was written use of Dirichlet priors (and more recently Dirichlet Process priors) has become widespread. Outlet: this was a fully (3 referees) refereed international conference
Schools and Departments: School of Engineering and Informatics > Informatics
Depositing User: Bill Keller
Date Deposited: 06 Feb 2012 18:53
Last Modified: 12 Apr 2012 11:55
📧 Request an update