A spoken corpus of Cameroon Pidgin English

Ozon, Gabriel, Ayafor, Miriam, Green, Melanie and FitzGerald, Sarah (2015) A spoken corpus of Cameroon Pidgin English. World Englishes. ISSN 0883-2919 (Accepted)

Cameroon Pidgin English (CPE) is an expanded pidgin/creole spoken in some form by an estimated 50% of Cameroon’s 22,000,000 population (Lewis et al. 2014), primarily in the Anglophone west regions, but also in urban centres throughout the country. As a primarily spoken language, CPE has no standardised orthography, but enjoys a vigorous oral tradition, not least through its presence in the broadcast media. However, it resists close documentation due to its stigmatised status in the face of French and English, prestige languages of Cameroon, where it also co-exists with an estimated 280 indigenous languages (Lewis et al. 2014). The majority of publications on English in Cameroon have to date focused mainly on the sociolinguistic aspects, with little close grammatical detail (e.g., De Féral 1989, Schröder 2003, Simo Bobda and Wolf 2003) and/or focus mainly on Cameroon Standard English (Mbangwana and Sala 2009, Wolf 2001).

We report on the construction of a 240,000-word pilot corpus of transcribed spoken CPE dialogues and monologues, with partial POS-tagging, glossing and translations. The proportions of monologue and dialogue are guided by the methodology of the International Corpus of English project, making our corpus immediately comparable with existing corpora of post-colonial varieties of English. The project is funded by a British Academy/Leverhulme small grant (ref. SG140663).

Besides operational and other expected difficulties in design (balance, representativeness) and compilation (collection, transcription, annotation), a corpus of a non-standard spoken variety poses certain challenges of its own. For example, despite its widespread use, CPE lacks a standard written form: an appropriate and properly motivated spelling system has to be developed prior to the transcription stage. Additionally, the intricacy of the language ecology in Cameroon makes identifying criteria for representativeness a challenge: although the project targets native speakers, there is considerable lectal variation as a consequence of the complex multilingual environment.

While a larger corpus is essential for investigating lexis, we illustrate how recurring grammatical patterns can still be investigated in a small corpus, despite the absence of POS-tagging. Even at a preliminary stage, ‘raw’ language data can (a) chart the distribution of certain known linguistic events, and (b) uncover evidence of new events.

We present a case study focusing on five high-frequency verbs in CPE, based on a small (100,000-word) ‘pre-pilot’ corpus of consisting of (i) spoken CPE (Ayafor, Green and Ozón, in prep. (a)), (ii) existing published sources (Ayisi & Longinotto 2005; Bellama et al. 2006; Todd 1979), and (iii) elicited examples.

Focusing on the verbs ‘make’, ‘do’, ‘give’, ‘get’ and ‘take’, we find evidence for a productive light verb strategy (Butt 2010) for the relexification of predicates (Wichmann and Wohlgemuth 2008), and observe that this small set of frequently occurring verbs participate both in light verb constructions (LVCs) (1) and in serial verb constructions (SVCs) (2). We also find evidence for (a) the grammaticalisation of mek ‘make’ as a marker of deontic modality, (b) a preference for the double object construction over the dative construction for gif ‘give’ ditransitives (contra Schröder 2013), and (c) the existence of benefactive gif ‘give’ SVCs in CPE (also contra Schröder 2013).

(1) no bi man di mek babisita, de wuman di mek babisita
‘It’s not the man who babysits, the woman babysits.’

(2) dem don kam lait lam gif wi
‘They came and lit lamps for us.’

