generalized perceptual linear predicition (gPLP)
Patrick Clemins
Project Goal
A feature extraction model for calculating a set of perceptually relevant features for animal vocalizations is presented, based on a generalization of perceptual linear prediction (PLP). The PLP model, popular in human speech processing, incorporates perceptual information such as frequency warping and equal loudness normalization into the feature extraction process. Since similar perceptual information is available for a number of animal species, it can be incorporated into the PLP model to extract perceptually relevant features for these species. To illustrate, qualitative comparisons are made between the species-specific model and the original PLP model using a set of vocalizations collected from captive African elephants and wild beluga whales. The models that incorporate perception information outperform the original human-based models in both visualization and classification tasks.
Block Diagram
Examples
The plots in the top row are traditional FFT-based spectrograms. The plots in the bottom row are perceptual spectrograms which show the effect of the gPLP extraction model on the spectrum. Note the effects of the non-linear frequency warping and the equal-loudness curve.
Elephant Rumble - FFT![]() |
Beluga Whistle - FFT![]() |
Elephant Rumble - gPLP![]() |
Beluga Whistle - gPLP![]() |