University of Sussex
Browse

File(s) not publicly available

Learning Stochastic Context-Free Grammars from Corpora Using a Genetic Algorithm.

presentation
posted on 2023-06-07, 20:03 authored by Rudi Lutz
A genetic algorithm for learning stochastic context-free grammars from finite language samples as described. Solutions to the inference problem are evolved by optimising the parameters of a covering grammar for a given language sample. We describe a number of experiments in learning grammars for a range of formal languages. The results of these experiments are encouraging and compare favourably with other approaches to stochastic grammatical inference.

History

Publication status

  • Published

Publisher

Smith G Steele NC and Albrecht RF (Eds) Artificial Neural Nets and Genetic Algorithms. Proceedings of the 3rd International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA-97) Norwich UK.

Page range

210-214

Presentation Type

  • other

Event location

Smith G Steele NC and Albrecht RF (Eds) Artificial Neural Nets and Genetic Algorithms. Proceedings of the 3rd International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA-97) Norwich UK.

Event type

conference

Department affiliated with

  • Informatics Publications

Notes

Originality: This paper (and its earlier conference version) was one of the first papers to apply a genetic algorithm to the problem of inferring stochastic grammar from a corpus. It also used a novel prior over stochastic context free grammars in order to bias the search towrads simpler grammars. It is also the first to compare this technique to use of the Inside-Outside algorithm. It also made use of an entirely novel crossover operator within the GA, which resulted in faster discovery of good grammars. Rigour: the fitness function for the GA is developed within a Bayesian approach to grammar induction. Results for all language learning tasks were given as averages over 20 runs. Languages learnt were the standard ones for the area, plus 3-symbol palindromes which had previously been considered too hard. Significance: We have demonstrated that a genetic algorithm, making use of a prior over grammars can successfully learn SCFGs from corpora. Good results are obtained on a range of formal languages using a prior that is designed to prefer simpler grammars. Grammars found by the GA are both small and accurate in comparison to those produced by the Inside-Outside algorithm. Impact: This paper, and its earlier conference version, have been cited a total of 20 times (Google Scholar) Outlet: A leading journal in its area. Impact factor 1.822

Full text available

  • No

Peer reviewed?

  • Yes

Legacy Posted Date

2012-02-06

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC