Teaching and Learning English Pronunciation by Generating the Vocal Tract Shapes from the Frequency Domain Information

Project Number
AFD 05/15 SL

Project Duration
February 2016 - February 2018

Status
In-Progress

Abstract
Traditional methods of teaching pronunciation often entail learners trying to imitate sounds produced by the teacher by listening and watching the teacher’s lips. More recently, learners are also able to listen to audio samples and watching the vocal tract animation. Learning is primarily done through drill and practice, with no feedback available for the student to understand whether his/her pronunciation is right or wrong. Data in the form of frequency-domain information such as a spectrograms or formant frequencies have been used widely in human voice synthesis. It demonstrated the close relation between frequency-domain information and the vocal tract shapes. However researchers and software developers to date have not been successful at translating such complex data into feedback that is useable for learning. The problem lays primarily in the ability of technology to process the massive and complex nature of such data and to make use of the technology in ways that are meaningful for education. This project will use the state-of-the art technological advances in modern machine learning to overcome such issues. It proposes to generate vocal tract shapes from the filtering of frequency-domain information. Lui (2013) demonstrated that the frequency-domain information needed to differentiate among different intonations was usually sparse – clearly suggesting that a small percentage of relevant information was sufficient. Using similar research strategies, we propose to develop machine-learning algorithms that will enable sparse spectrographic representations of human speech sounds to be generated and simulated visually as vocal tract shapes. This information will be used to develop an interactive app that will use these real time visual representations as feedback to help learners improve their pronunciation. The app will be primarily used by students to extend learning outside regular curriculum time, and also be used for in-class activity by teachers.

Funding Source
MOE

Related Projects