Audio processing


This section will give a deeper insight in some simple and some more complex audio processing utilities of Bob. Currently, only cepstral extraction module is available. We are planning to update and add more features in the near future.

Simple audio processing

Below are 3 examples on how to read a wavefile and how to compute Linear frequency Cepstral Coefficients (LFCC) and Mel frequency cepstrum coefficients (MFCC).

Reading audio files

The usual native formats can be read with module. Other wave formats can be found in some other python modules like pysox. An example of wave file can be found here bob/ap/test/data/sample.wav

>>> import 
>>> rate, signal = 
>>> print rate
>>> print signal
[  28   72   58 ..., -301   89  230]

In the above example, the sampling rate of the audio signal is 8 KHz and the signal array is of type int16.

User can directly compute the duration of signal (in seconds):

>>> print len(signal)/rate

LFCC and MFCC Extraction

The LFCC and MFCC coefficients can be extracted from a audio signal by using bob.ap.Ceps(). To do so, several parameters can be precised by the user. Typically, these are precised in a configuration file. The following values are the default ones:

>>> win_length_ms = 20 # The window length of the cepstral analysis in milliseconds
>>> win_shift_ms = 10 # The window shift of the cepstral analysis in milliseconds
>>> n_filters = 24 # The number of filter bands
>>> n_ceps = 19 # The number of cepstral coefficients
>>> f_min = 0. # The minimal frequency of the filter bank
>>> f_max = 4000. # The maximal frequency of the filter bank
>>> delta_win = 2 # The integer delta value used for computing the first and second order derivatives
>>> pre_emphasis_coef = 0.97 # The coefficient used for the pre-emphasis
>>> dct_norm = True # A factor by which the cepstral coefficients are multiplied
>>> mel_scale = True # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale

Once the parameters are precised, bob.ap.Ceps() can be called as follows:

>>> c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min, f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
>>> signal = numpy.cast['float'](signal) # vector should be in **float**
>>> mfcc = c(signal)
>>> print len(mfcc)
>>> print len(mfcc[0])

LFCCs can be computed instead of MFCCs by setting mel_scale to False

>>> c.mel_scale = False
>>> lfcc = c(signal)

User can also choose to extract the energy. This is typically used for Voice Activity Detection. Please check spkRecLib or FaceRecLib for more details about VAD.

>>> c.with_energy = True
>>> lfcc_e = c(signal)
>>> print len(lfcc_e)
>>> print len(lfcc_e[0])

It is also possible to compute first and second derivatives for those features:

>>> c.with_delta = True
>>> c.with_delta_delta = True
>>> lfcc_e_d_dd = c(signal)
>>> print len(lfcc_e_d_dd)
>>> print len(lfcc_e_d_dd[0])

Table Of Contents

Previous topic

Image and signal processing

Next topic


This Page