File(s) not publicly available
Evolutionary induction of stochastic context free grammars
journal contribution
posted on 2023-06-08, 09:35 authored by Bill Keller, Rudi LutzThis paper describes an evolutionary approach to the problem of inferring stochastic context-free grammars from finite language samples. The approach employs a distributed, steady-state genetic algorithm, with a fitness function incorporating a prior over the space of possible grammars. Our choice of prior is designed to bias learning towards structurally simpler grammars. Solutions to the inference problem are evolved by optimizing the parameters of a covering grammar for a given language sample. Full details are given of our genetic algorithm (GA) and of our fitness function for grammars. We present the results of a number of experiments in learning grammars for a range of formal languages. Finally we compare the grammars induced using the GA-based approach with those found using the inside¿outside algorithm. We find that our approach learns grammars that are both compact and fit the corpus data well.
History
Publication status
- Published
Journal
Pattern RecognitionISSN
0031-3203External DOI
Issue
9Volume
38Page range
1393-1406Pages
14.0Department affiliated with
- Informatics Publications
Notes
Originality: Proposes an original, evolutionary approach to the problem of inferring stochastic context-free grammars (SCFGs) from finite language samples. The approach employs a distributed, steady-state genetic algorithm, with a new cross-over operator and a fitness function incorporating a prior designed to bias learning towards structurally simpler grammars. The reported work was the first to apply a GA-based approach to learning SCFGs and novel in optimizing the parameters of a covering grammar for a given language sample. Rigour: The paper presents the results of an empirical evaluation in learning grammars for a range of bench-mark formal languages. Grammars learned using the GA-based approach were compared with those found using the inside-outside algorithm and were shown to be more compact whilst fitting the corpus data as well. Significance: Demonstrated the potential of GA-based optimisation for the identification of accurate, compact grammars and as an alternative to standard approaches based on expectation-maximisation. Impact: The paper was selected to appear in this journal special issue on the increasingly important topic in language learning. Total citations (Google scholar) for this journal paper and its preceding conference paper are 23.Full text available
- No
Peer reviewed?
- Yes
Editors
M BasuLegacy Posted Date
2012-02-06Usage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC