About the Corpus
The NIE Spoken Corpus of English in
Asia (NIESCEA) is the output of a formal spoken corpus project which is entitled ‘RS
9/12 LEL: Building an NIE Spoken Corpus of English in Asia (NIESCEA)’ and funded by Research Support for Senior Academic
Administrator (RS-SAA) Grant, National Institute of Education (NIE), Nanyang
Technological University, Singapore. The Principal Investigator
(PI) of this project is A/P Low Ee Ling, Head, Office of Strategic Planning and
Academic Quality and Associate Professor of English Language and Literature,
corpus seeks to build a first formal spoken corpus of English in
Asia to be used for phonetic/phonological and grammatical research on Asian Englishes.
It uses a carefully designed set of test materials (see the ‘Corpus
Information’ section) for the systematic investigation of both the segmental
and suprasegmental aspects of phonetics and phonology as well as features of
grammar of the varieties in question. In selecting countries for collecting
data, we adopted the Kachruvian model which distinguishes between Inner Circle
(countries such as UK and USA where English is a native language), Outer Circle
(countries such as Singapore and India where English gains the official status
with important intranational uses) and Expanding Circle (countries such as
China and Japan where English is learnt as a foreign language). The speech data
of this corpus were collected from four Outer Circle countries, namely
Singapore, India, Brunei and Philippine, and four Expanding Circle countries,
namely China, Japan, Thailand and Vietnam. The corpus involves a total of
eighty informants (ten informants per country) who are all undergraduates and above.
The main objectives of
the corpus are as follows:
i) To build a first formal spoken corpus of English
in Asia, i.e. NIESCEA that can be used to investigate the phonetic/phonological
and grammatical features of Asian Englishes.
advance knowledge about English in Asia by providing scholars from both Asia
and other parts of the world with access to speech data of Asian Englishes.
from whom the speech data are collected for this corpus are undergraduates and postgraduates,
aged between 18 and 35. They were all born and at least received their basic
education (i.e. from kindergarten to senior high school) in their home
countries. Besides recording the informants reading the test materials, we also
collected their bio-data and linguistic background.
The Test Materials
The test materials used to collect the corpus data comprise the
i) Three Short Passages
The three short passages in
the data consist of ‘The North Wind and the Sun’ (The NWS Passage), ‘The Boy
Who Cried Wolf’ (The Wolf Passage), and ‘Arthur the Rat’ (The Rat Passage).
The NWS Passage is the
translation of the fable ‘The North Wind and the Sun’ and has been recommended
by the International Phonetic Association (IPA) (see IPA, 1999, pp. 39-44), for
the phonetic analysis of English. Deterding
(2006) provides a good discussion of this passage. Click here to download the orthographic version of The NWS
Passage. The transcribed versions of The NWS Passage are available in IPA
The Wolf Passage was developed by Deterding
(2006) where more detailed discussion of this passage can be found. This
passage can be used for analysis (both auditory and acoustic) of segmentals
(i.e. vowels and consonants) and suprasegmentals (e.g. rhythm and stress) of
varieties of English and has in fact been frequently used in publications and
postgraduate theses since it was developed. For a detailed discussion of the Wolf Passage, refer to Deterding (2006).
Click here to download the orthographic version of The Wolf Passage.
Rat Passage is a story designed to elicit all significant pronunciation
variants in varieties of English and it allows comparison of sounds either
within a variety or cross-varietally. This passage was used to elicit
pronunciation data in the Harvard DARE (Dictionary of American Regional
English) project (see DARE, 2013) between 1965 and 1970. Click here to download the orthographic version of The Rat
ii) Four Sets of Sentences
These four sets of sentences
were developed by Low (1998) to test suprasegmental features such as stress and
rhythm. For details about how these sets of sentences were used to test the
suprasegmental features of a variety of English (i.e. Singapore English), refer
to Low (1998).
sentence set consists of a list of six sentences which are randomised with 4
filler sentences. The set includes sentences which contained lexical items that
appear to be stressed differently in different varieties of English. For
example, Deterding (1994) observed that demonstratives and pronouns receive
more prominence in Singapore English. Thus, the test materials include
sentences that contain demonstratives and pronouns. Part of the materials was
designed to fulfil a dual function, namely, to test the variability between successive
vowels as well as the presence or absence of deaccenting in varieties of
English. These sentences contained lexical items which are repeated at the end
of the sentence. Click here to download
Sentence Set 1.
set consists of a list of ten sentences. These may be divided into two sets.
Five sentences contain full and reduced vowels as they would potentially be
realised in the Inner Circle varieties of English, e.g. British English and
American English. The other five comprised sentences which only had full
vowels. The reason for having only full vowels in one set and both full and
reduced vowels in the other is to allow a comparison between the realisation of
full vowels in different varieties of English, e.g. between Singapore English
and British English. Note that the consonantal context surrounding the vowels
in each corresponding pair of sentences is similar. Click here to download Sentence Set 2.
sentence set was designed with a view to eliciting speech data for comparing
the polysyllabic test items in different positions in the intonational phrase,
other than just before a boundary. In order to ensure that the polysyllabic
test items are stressed initially in Inner Circle varieties such as British
English and American English, ten monosyllabic test items and their
morphologically derived forms are selected. In these items, stress is likely to
be fixed on the root morpheme. Materials were selected according to the
principles governing stress placement in British or American English where
certain suffixes are known to be stress-neutral and will not shift the location
of stress when added onto a root morphemes. For example, the test item man was selected along with its
morphologically derived forms manful
and manfully since the suffixes added
onto the root morpheme man are
stress-neutral and do not affect the stress on man. Then, all test items along with their morphologically derived
forms were inserted into phrase-final and phrase-medial positions in controlled
materials. Click here to download Sentence Set
Set 4 consists of 20 sentences. Ten of them contain compound words and the other
10 contain the noun phrases formed by replacing the first element of the compound
words with adjectives.
This sentence set was designed to compare between the assignment of
compound and phrasal stress in Inner Circle varieties and other varieties of
English, e.g. between British English and Singapore English. A list of ten
compound words which receive initial stress in Inner Circle varieties such as
British or American English were chosen to form part of the corpus. All
compound words selected were stressed initially in British or American English.
In general, stress placement in compound words in British or American English
varies according to the categories of the words they are composed of. Compounds
made up of two nouns normally have stress on their first element (Roach, 1991,
pp. 99-100), such as DRESScode and SCHOOLyard. In addition, the test items
were placed in sentence-final position since, according to Fudge (1984, p. 2),
the lexically stressed syllable is not always assigned more prominence than
other syllables unless it coincides with the nucleus of the utterance. Taking
this into consideration, the compounds were placed in sentence-final position
where they can be expected to carry nuclear stress when the sentence is
produced out of context.
Also included in this sentence set are noun phrases created by replacing
the first elements of the ten compound words selected with an adjective. For
example, the compound armchair was
replaced by the noun phrase old chair
consisting of an adjective and a noun. Likewise, the compound schoolyard was replaced by the noun
phrase cool yard. All test items were
replaced in similar carrier sentences such as, It resembled an old chair, and are always in phrase-final position.
Click here to download Sentence Set 4.
iii) Vowels in Citation Form
Citation form refers to the
form of a word that occurs when it is cited or pronounced in isolation
(Ladefoged & John, 2015). In order to test vowel quality and vowel duration
in the formal context, we have selected 33 monosyllabic sample words containing
11 monophthongs (i.e. 3 sample words for each monophthong) and embedded them in
the carrier sentence ‘Please say ____ again’. In selecting the monosyllabic
sample words, we have purposely avoided approximants /w/, /j/ and /r/ that
precede or follow the vowels. Some of these sample words form minimal pairs.
Note, however, that these minimal pairs are randomized rather than appearing
together. Click here to download Vowels in
iv) The Interview
this 5-minute interview, informants were asked to talk about their most
memorable holidays. The speech data collected from the interview can be used to
investigate phonetic/phonological features (e.g. both segmentals and
suprasegmentals) as well as grammatical features of Asian varieties of English.
The interview also provides researchers with a good sample of informal speech. The
transcriptions of the interviews will be sent together with their audio
recordings upon request.
informants were recorded in a quiet room or in a sound lab. All recordings were
saved in the uncompressed audio format (.wav format) to ensure high quality of
sound and for ease of importing into Praat. The length of each recording (for the
whole set of test materials) ranges from approximately 15 to 20 minutes.
We would like to express our
heartfelt gratitude to:
National Institute of
Education, Singapore for the Research Support for Senior Academic Administrator
(RS-SAA) Grant which has made the NIESCEA possible.
Dr David Deterding and Ms Ishamina
Athirah, University of Brunei Darussalam for their assistance in collecting the
Brunei English data.
Dr Pornapit Darasawang and Dr
Natjiree Jaturapitakkul, King Mongkut’s University of Technology Thonburi,
Thailand for their assistance in collecting the Thai English data.
Dr Michiko Nakano and Ms
Emi Tomita, Waseda University, Japan for their help with the collection of
Japanese English data.
Ms Ava Patricia Cabiguin Avila
and Dr Ruanni Tupas for their assistance in contacting Philippine participants.
Ms Huynh Thi Canh Dien and Ms
Nguyen Ngoc Anh Thu for their assistance in contacting Vietnamese participants.
Friends, colleagues and
informants who helped out in the data collection process.
Last but not least, all the
informants for their support and commitment.
(2013). Dictionary of American Regional
English. Cambridge: Harvard University Press. (Also available digitally at http://www.daredictionary.com/ page/resources).
Deterding, D. (1994).
The intonation of Singapore English. Journal
of the International Phonetic Association, 24 (2), 61-72.
D. (2006). The north wind versus a wolf: Short texts for the description and
measurement of English pronunciation. Journal of the International Phonetic
Association, 36, 187-196.
Fudge, E. C. (1984). English word stress. London: Allen and Unwin.
IPA. (1999). Handbook of the International Phonetic Association. Cambridge:
Cambridge University Press.
Ladefoged, P., & Johnson, K. (2015). A course in phonetics (7th
ed.). Cengage Learning.
Low, E. L.
(1998). Prosodic prominence in Singapore English. Unpublished doctoral
dissertation. University of Cambridge, Cambridge, UK.
Roach, P. (1991). English
Phonetics and Phonology: A Practical Course (2nd ed.). Cambridge:
Cambridge University Press.
recordings of the NIESEA corpus will be made available to researchers and
teachers of English worldwide after its launch at 18th English in
South-East Asia Conference (ESEA-18) to be held from 16-17 November 2015 in
University of Brunei Darussalam. The test materials (in PDF format) will be
available for direct download from this website. In order to download the audio
recordings, researchers/teachers will be required to fill out and submit Licence
Agreement & Request Form (Please download the form here)
to the email address provided in ‘Contact Us’ and then will be given a link
from which researchers/teachers can download the audio files.
Associate Professor Low Ee Ling, Office
of Strategic Planning and Academic Quality, National Institute of Education, 1
Nanyang Walk, Singapore 637616.
your Licence Agreement & Request Form to Dr Low Ee Ling at:
enquiries, please email Mr Ao Ran at:
How to cite the Corpus?
Low, E. L. (2015). The NIE Spoken
Corpus of English in Asia (NIESCEA). Singapore: National Institute of Education,
Nanyang Technological University.