Learning Stochastic Context-Free Grammars from Corpora Using a Genetic Algorithm.

Lutz, Rudi (1998) Learning Stochastic Context-Free Grammars from Corpora Using a Genetic Algorithm. In: Unset, Smith G Steele NC and Albrecht RF (Eds) Artificial Neural Nets and Genetic Algorithms. Proceedings of the 3rd International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA-97) Norwich UK..

Full text not available from this repository.


A genetic algorithm for learning stochastic context-free grammars from finite language samples as described. Solutions to the inference problem are evolved by optimising the parameters of a covering grammar for a given language sample. We describe a number of experiments in learning grammars for a range of formal languages. The results of these experiments are encouraging and compare favourably with other approaches to stochastic grammatical inference.

Item Type: Conference or Workshop Item (Other)
Additional Information: Originality: This paper (and its earlier conference version) was one of the first papers to apply a genetic algorithm to the problem of inferring stochastic grammar from a corpus. It also used a novel prior over stochastic context free grammars in order to bias the search towrads simpler grammars. It is also the first to compare this technique to use of the Inside-Outside algorithm. It also made use of an entirely novel crossover operator within the GA, which resulted in faster discovery of good grammars. Rigour: the fitness function for the GA is developed within a Bayesian approach to grammar induction. Results for all language learning tasks were given as averages over 20 runs. Languages learnt were the standard ones for the area, plus 3-symbol palindromes which had previously been considered too hard. Significance: We have demonstrated that a genetic algorithm, making use of a prior over grammars can successfully learn SCFGs from corpora. Good results are obtained on a range of formal languages using a prior that is designed to prefer simpler grammars. Grammars found by the GA are both small and accurate in comparison to those produced by the Inside-Outside algorithm. Impact: This paper, and its earlier conference version, have been cited a total of 20 times (Google Scholar) Outlet: A leading journal in its area. Impact factor 1.822
Schools and Departments: School of Engineering and Informatics > Informatics
Depositing User: Rudi Lutz
Date Deposited: 06 Feb 2012 18:27
Last Modified: 31 May 2012 09:47
URI: http://sro.sussex.ac.uk/id/eprint/16429
📧 Request an update