NIE Spoken Corpus of English in Asia (NIESCEA)

About the Corpus

The NIE Spoken Corpus of English in Asia (NIESCEA) is the output of a formal spoken corpus project which is entitled ‘RS 9/12 LEL: Building an NIE Spoken Corpus of English in Asia (NIESCEA)’ and funded by Research Support for Senior Academic Administrator (RS-SAA) Grant, National Institute of Education (NIE), Nanyang Technological University, Singapore. The Principal Investigator (PI) of this project is A/P Low Ee Ling, Head, Office of Strategic Planning and Academic Quality and Associate Professor of English Language and Literature, NIE.

The NIESCEA corpus seeks to build a first formal spoken corpus of English in Asia to be used for phonetic/phonological and grammatical research on Asian Englishes. It uses a carefully designed set of test materials (see the ‘Corpus Information’ section) for the systematic investigation of both the segmental and suprasegmental aspects of phonetics and phonology as well as features of grammar of the varieties in question. In selecting countries for collecting data, we adopted the Kachruvian model which distinguishes between Inner Circle (countries such as UK and USA where English is a native language), Outer Circle (countries such as Singapore and India where English gains the official status with important intranational uses) and Expanding Circle (countries such as China and Japan where English is learnt as a foreign language). The speech data of this corpus were collected from four Outer Circle countries, namely Singapore, India, Brunei and Philippine, and four Expanding Circle countries, namely China, Japan, Thailand and Vietnam. The corpus involves a total of eighty informants (ten informants per country) who are all undergraduates and above. 


The main objectives of the corpus are as follows:

i)   To build a first formal spoken corpus of English in Asia, i.e. NIESCEA that can be used to investigate the phonetic/phonological and grammatical features of Asian Englishes.

ii)   To advance knowledge about English in Asia by providing scholars from both Asia and other parts of the world with access to speech data of Asian Englishes.

Corpus Information

The Informants

All informants from whom the speech data are collected for this corpus are undergraduates and postgraduates, aged between 18 and 35. They were all born and at least received their basic education (i.e. from kindergarten to senior high school) in their home countries. Besides recording the informants reading the test materials, we also collected their bio-data and linguistic background.

The Data

The Test Materials

The test materials used to collect the corpus data comprise the following: 

i)   Three Short Passages

The three short passages in the data consist of ‘The North Wind and the Sun’ (The NWS Passage), ‘The Boy Who Cried Wolf’ (The Wolf Passage), and ‘Arthur the Rat’ (The Rat Passage).

The NWS Passage is the translation of the fable ‘The North Wind and the Sun’ and has been recommended by the International Phonetic Association (IPA) (see IPA, 1999, pp. 39-44), for the phonetic analysis of English. Deterding (2006) provides a good discussion of this passage. Click here to download the orthographic version of The NWS Passage. The transcribed versions of The NWS Passage are available in IPA (1999).

The Wolf Passage was developed by Deterding (2006) where more detailed discussion of this passage can be found. This passage can be used for analysis (both auditory and acoustic) of segmentals (i.e. vowels and consonants) and suprasegmentals (e.g. rhythm and stress) of varieties of English and has in fact been frequently used in publications and postgraduate theses since it was developed. For a detailed discussion of the Wolf Passage, refer to Deterding (2006). Click here to download the orthographic version of The Wolf Passage.

The Rat Passage is a story designed to elicit all significant pronunciation variants in varieties of English and it allows comparison of sounds either within a variety or cross-varietally. This passage was used to elicit pronunciation data in the Harvard DARE (Dictionary of American Regional English) project (see DARE, 2013) between 1965 and 1970. Click here to download the orthographic version of The Rat Passage.

ii)   Four Sets of Sentences

These four sets of sentences were developed by Low (1998) to test suprasegmental features such as stress and rhythm. For details about how these sets of sentences were used to test the suprasegmental features of a variety of English (i.e. Singapore English), refer to Low (1998).

Sentence Set 1

This sentence set consists of a list of six sentences which are randomised with 4 filler sentences. The set includes sentences which contained lexical items that appear to be stressed differently in different varieties of English. For example, Deterding (1994) observed that demonstratives and pronouns receive more prominence in Singapore English. Thus, the test materials include sentences that contain demonstratives and pronouns. Part of the materials was designed to fulfil a dual function, namely, to test the variability between successive vowels as well as the presence or absence of deaccenting in varieties of English. These sentences contained lexical items which are repeated at the end of the sentence. Click here to download Sentence Set 1.

Sentence Set 2

The second set consists of a list of ten sentences. These may be divided into two sets. Five sentences contain full and reduced vowels as they would potentially be realised in the Inner Circle varieties of English, e.g. British English and American English. The other five comprised sentences which only had full vowels. The reason for having only full vowels in one set and both full and reduced vowels in the other is to allow a comparison between the realisation of full vowels in different varieties of English, e.g. between Singapore English and British English. Note that the consonantal context surrounding the vowels in each corresponding pair of sentences is similar. Click here to download Sentence Set 2.

Sentence Set 3

The sentence set was designed with a view to eliciting speech data for comparing the polysyllabic test items in different positions in the intonational phrase, other than just before a boundary. In order to ensure that the polysyllabic test items are stressed initially in Inner Circle varieties such as British English and American English, ten monosyllabic test items and their morphologically derived forms are selected. In these items, stress is likely to be fixed on the root morpheme. Materials were selected according to the principles governing stress placement in British or American English where certain suffixes are known to be stress-neutral and will not shift the location of stress when added onto a root morphemes. For example, the test item man was selected along with its morphologically derived forms manful and manfully since the suffixes added onto the root morpheme man are stress-neutral and do not affect the stress on man. Then, all test items along with their morphologically derived forms were inserted into phrase-final and phrase-medial positions in controlled materials. Click here to download Sentence Set 3.

Sentence Set 4

Sentence Set 4 consists of 20 sentences. Ten of them contain compound words and the other 10 contain the noun phrases formed by replacing the first element of the compound words with adjectives.

This sentence set was designed to compare between the assignment of compound and phrasal stress in Inner Circle varieties and other varieties of English, e.g. between British English and Singapore English. A list of ten compound words which receive initial stress in Inner Circle varieties such as British or American English were chosen to form part of the corpus. All compound words selected were stressed initially in British or American English. In general, stress placement in compound words in British or American English varies according to the categories of the words they are composed of. Compounds made up of two nouns normally have stress on their first element (Roach, 1991, pp. 99-100), such as DRESScode and SCHOOLyard. In addition, the test items were placed in sentence-final position since, according to Fudge (1984, p. 2), the lexically stressed syllable is not always assigned more prominence than other syllables unless it coincides with the nucleus of the utterance. Taking this into consideration, the compounds were placed in sentence-final position where they can be expected to carry nuclear stress when the sentence is produced out of context.

Also included in this sentence set are noun phrases created by replacing the first elements of the ten compound words selected with an adjective. For example, the compound armchair was replaced by the noun phrase old chair consisting of an adjective and a noun. Likewise, the compound schoolyard was replaced by the noun phrase cool yard. All test items were replaced in similar carrier sentences such as, It resembled an old chair, and are always in phrase-final position. Click here to download Sentence Set 4.

iii)   Vowels in Citation Form

Citation form refers to the form of a word that occurs when it is cited or pronounced in isolation (Ladefoged & John, 2015). In order to test vowel quality and vowel duration in the formal context, we have selected 33 monosyllabic sample words containing 11 monophthongs (i.e. 3 sample words for each monophthong) and embedded them in the carrier sentence ‘Please say ____ again’. In selecting the monosyllabic sample words, we have purposely avoided approximants /w/, /j/ and /r/ that precede or follow the vowels. Some of these sample words form minimal pairs. Note, however, that these minimal pairs are randomized rather than appearing together. Click here to download Vowels in Citation Form.

iv)   The Interview

In this 5-minute interview, informants were asked to talk about their most memorable holidays. The speech data collected from the interview can be used to investigate phonetic/phonological features (e.g. both segmentals and suprasegmentals) as well as grammatical features of Asian varieties of English. The interview also provides researchers with a good sample of informal speech. The transcriptions of the interviews will be sent together with their audio recordings upon request.

The Recordings

All informants were recorded in a quiet room or in a sound lab. All recordings were saved in the uncompressed audio format (.wav format) to ensure high quality of sound and for ease of importing into Praat. The length of each recording (for the whole set of test materials) ranges from approximately 15 to 20 minutes.


We would like to express our heartfelt gratitude to:

National Institute of Education, Singapore for the Research Support for Senior Academic Administrator (RS-SAA) Grant which has made the NIESCEA possible.

Dr David Deterding and Ms Ishamina Athirah, University of Brunei Darussalam for their assistance in collecting the Brunei English data.

Dr Pornapit Darasawang and Dr Natjiree Jaturapitakkul, King Mongkut’s University of Technology Thonburi, Thailand for their assistance in collecting the Thai English data.

Dr Michiko Nakano and Ms Emi Tomita, Waseda University, Japan for their help with the collection of Japanese English data.

Ms Ava Patricia Cabiguin Avila and Dr Ruanni Tupas for their assistance in contacting Philippine participants.

Ms Huynh Thi Canh Dien and Ms Nguyen Ngoc Anh Thu for their assistance in contacting Vietnamese participants.

Friends, colleagues and informants who helped out in the data collection process.

Last but not least, all the informants for their support and commitment.


The audio recordings of the NIESEA corpus will be made available to researchers and teachers of English worldwide after its launch at 18th English in South-East Asia Conference (ESEA-18) to be held from 16-17 November 2015 in University of Brunei Darussalam. The test materials (in PDF format) will be available for direct download from this website. In order to download the audio recordings, researchers/teachers will be required to fill out and submit Licence Agreement & Request Form (Please download the form here) to the email address provided in ‘Contact Us’ and then will be given a link from which researchers/teachers can download the audio files.

