Evolutionary Induction of Stochastic Context Free Grammars

Keller, Bill and Lutz, Rudi (2005) Evolutionary Induction of Stochastic Context Free Grammars. Pattern Recognition, 38 (9). pp. 1393-1406. ISSN 0031-3203

Full text not available from this repository.

Abstract

This paper describes an evolutionary approach to the problem of inferring stochastic context-free grammars from finite language samples. The approach employs a distributed, steady-state genetic algorithm, with a fitness function incorporating a prior over the space of possible grammars. Our choice of prior is designed to bias learning towards structurally simpler grammars. Solutions to the inference problem are evolved by optimizing the parameters of a covering grammar for a given language sample. Full details are given of our genetic algorithm (GA) and of our fitness function for grammars. We present the results of a number of experiments in learning grammars for a range of formal languages. Finally we compare the grammars induced using the GA-based approach with those found using the inside¿outside algorithm. We find that our approach learns grammars that are both compact and fit the corpus data well.

Item Type: Article
Additional Information: Originality: Proposes an original, evolutionary approach to the problem of inferring stochastic context-free grammars (SCFGs) from finite language samples. The approach employs a distributed, steady-state genetic algorithm, with a new cross-over operator and a fitness function incorporating a prior designed to bias learning towards structurally simpler grammars. The reported work was the first to apply a GA-based approach to learning SCFGs and novel in optimizing the parameters of a covering grammar for a given language sample. Rigour: The paper presents the results of an empirical evaluation in learning grammars for a range of bench-mark formal languages. Grammars learned using the GA-based approach were compared with those found using the inside-outside algorithm and were shown to be more compact whilst fitting the corpus data as well. Significance: Demonstrated the potential of GA-based optimisation for the identification of accurate, compact grammars and as an alternative to standard approaches based on expectation-maximisation. Impact: The paper was selected to appear in this journal special issue on the increasingly important topic in language learning. Total citations (Google scholar) for this journal paper and its preceding conference paper are 23.
Schools and Departments: School of Engineering and Informatics > Informatics
Depositing User: Bill Keller
Date Deposited: 06 Feb 2012 21:18
Last Modified: 06 Feb 2012 22:06
URI: http://sro.sussex.ac.uk/id/eprint/30656
📧 Request an update