Research Activities and Project Links
Short Summaries of Recent Projects
In order to support effective learning and provide specific, useful pronunciation feedback to users, Computer Aided Language Learning (CALL) systems for pronunciation correction must be able to capture pronunciation errors and accurately identify and describe errors in articulation. To do this, it is necessary to estimate articulator trajectory patterns from users’ acoustic data. Due to the difficulty of acoustic-articulator inversion and the complexities of inter-speaker differences in articulator patterns, this capacity is not yet well developed. Current systems are limited in the specificity of the corrective feedback that is provided, often only providing a “good versus bad” pronunciation match to the target and even at best only providing the general category of pronunciation error. This project, which has received recent initial funding from the NSF through the EAGER program, aims to address these key limitations through collection of a matched acoustic and five degree of freedom electromagnetic articulograph (EMA) data corpus for both native American English (L1) speakers and native Mandarin Chinese (L2) speakers who speak English as a second language. This has potential to be used for a variety of research efforts, including areas such as pronunciation variation modeling, acoustic-articulator inversion, L2-L1 speaker comparisons, pronunciation error detection, and corrective feedback for accent modification.
In the past few decades, researchers have made substantial progress in developing methods for evaluation and enhancement of perceived speech quality in noisy environments. Despite this, there has not been similar progress in the area of speech intelligibility. It has been recently shown that while a great many different speech enhancement approaches give statistically significant improvements in perceived signal quality, none lead to statistically significant improvements in intelligibility in more than one noise environment. Current enhancement methods simply don’t improve signal intelligibility in a substantial way. It can be argued that the use of quality rather than intelligibility as a primary evaluation metric has led to misguided research directions, with incremental improvements to quality coming at the expense of intelligibility. We are currently working to address this issue by developing more accurate evaluation metrics for objectively estimating speech intelligibility, and in association with this to develop enhancement methods that use our understanding of perception and intelligibility that will more effectively improve signal intelligibility.
The fundamental goal of this research project is to develop a broadly useable framework for pattern analysis and classification of animal vocalizations, by integrating successful models and ideas from the field of speech processing and recognition into bioacoustics. Tasks include automatic vocalization classification and labeling, individual identification, call type classification, behavioral vocalization correlations, language acquisition, and seismic infrasonic communication. Species being targeted for study include domestic and agricultural animals, marine mammals, and several endangered species, in collaboration with researchers at a number of other institutions.
One of the outcomes of the Dolittle project was the realization that accurate individual identification was possible across a wide range of animal species, and that this could lead to significant improvements in methods for tasks like acoustic censusing, important for many vocal species that are difficult to visually census. This has led to several continuing projects and current proposal efforts to develop acoustic censusing methods based on speech processing and speaker identification technology.
This research project focuses on applying state-of-the-art techniques for time-series modeling to the problem of characterizing speech signals. These time-series techniques combine state-space embedding methods and learning algorithms to create highly accurate non-linear models of a system's state. The time-delay embedding technique, taken from dynamical systems theory, is used to reconstruct the state spaces of the speech waveforms, which are characterized statistically and used to differentiate individual phonemes for isolated and continuous speech recognition.