University of Sussex
Browse

File(s) under permanent embargo

Addressing a coverage gap in African Englishes: the tagged corpus of Cameroon Pidgin English

chapter
posted on 2023-06-09, 08:34 authored by Gabriel Ozón, Sarah FitzgeraldSarah Fitzgerald, Melanie GreenMelanie Green
This paper illustrates the uses of a tagged pilot corpus of spoken Cameroon Pidgin English (CPE), which has recently been finalised (Ozón et al. 2017) and made available on line (Green et al. 2016). The corpus consists of 240,000 words, with mark-up and part-of-speechtagging. The text categories and the proportions of monologue and dialogue are in line with those of the ICE project (Nelson 1996), making the CPE corpus directly comparable with existing corpora of post-colonial Englishes. The project necessitated the development of a designated tagset for CPE, which was employed to tag the corpus automatically with Tree Tagger (Schmid 1994), for which 94% accuracy was achieved. This tagged corpus offers an invaluable resource for the investigation of CPE, and is particularly useful for automatic retrieval of language phenomena above the level of the lexicon, for which a substantially larger corpus is required. The tagging in particular is instrumental in addressing issues of multifunctionality characteristic of pidgin/creole languages. For example, certain verbs (e.g. goe ‘go’, kam ‘come’, gif ‘give’ and teik ’take’) can function independently as lexical verbs and can also participate in serial verb constructions (SVCs) in CPE. The tagged corpus makes a distinction between the different uses of these verbs, allowing automatic retrieval with a simple search. We introduce the dataset and present some case studies illustrating its potential uses, in order to highlight the usefulness of such freely accessible resources for research on African languages.

History

Publication status

  • Published

File Version

  • Accepted version

Journal

International Journal of Corpus Linguistics

ISSN

1384-6655

Publisher

John Benjamins Publishing

Page range

144-164

Pages

403.0

Book title

Corpus Linguistics and African Englishes

Place of publication

Amsterdam

ISBN

9789027202192

Department affiliated with

  • English Publications

Full text available

  • No

Peer reviewed?

  • Yes

Editors

Bassey E Antia, Ulrike Gut, Alexandra U Esimaje

Legacy Posted Date

2017-11-02

First Compliant Deposit (FCD) Date

2017-11-02

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC