Audio Processing for Bob¶
This package is part of the signal-processing and machine learning toolbox Bob. It contains basic audio processing utilities. Currently, the following cepstral-based features are available: using rectangular (RFCC), mel-scaled triangular (MFCC) [Davis1980], inverted mel-scaled triangular (IMFCC), and linear triangular (LFCC) filters [Furui1981], spectral flux-based features (SSFC) [Scheirer1997], subband centroid frequency (SCFC) [Le2011]. We are planning to update and add more features in the near future.
Please note that the implementation of MFCC and LFCC features has changed compared to an earlier version of the package, as we corrected pre-emphasis and DCT computations. Delta and delta-delta computations were slightly changed too.
Complete Bob’s installation instructions. Then, to install this package, run:
$ conda install bob.ap
For questions or reporting issues to this software package, contact our development mailing list.
|[Davis1980]||S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, in IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980, num 4, vol. 28, pages 357-366.|
|[Furui1981]||S. Furui, Cepstral analysis technique for automatic speaker verification, in IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, num 2 vol 29, pages 254-272.|
|[Scheirer1997]||E. Scheirer and M. Slaney, Construction and evaluation of a robust multifeature speech/music discriminator, in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 1997, vol 2, pages 1331-1334.|
|[Le2011]||P. N. Le, E. Ambikairajah, J. Epps, V. Sethu, E. H. C. Choi, Investigation of Spectral Centroid Features for Cognitive Load Classification, in Speech Commun., April, 2011, num 4, vol 53, pages 540–551.|