This section will give a deeper insight in some simple and some more complex audio processing utilities of Bob. Currently, only cepstral extraction module is available. We are planning to update and add more features in the near future.
Below are 3 examples on how to read a wavefile and how to compute Linear frequency Cepstral Coefficients (LFCC) and Mel frequency cepstrum coefficients (MFCC).
The usual native formats can be read with scipy.io.wavfile module. Other wave formats can be found in some other python modules like pysox. An example of wave file can be found here bob/ap/test/data/sample.wav
>>> import scipy.io.wavfile >>> rate, signal = scipy.io.wavfile.read(str(wave_path)) >>> print rate 8000 >>> print signal [ 28 72 58 ..., -301 89 230]
In the above example, the sampling rate of the audio signal is 8 KHz and the signal array is of type int16.
User can directly compute the duration of signal (in seconds):
>>> print len(signal)/rate 2
The LFCC and MFCC coefficients can be extracted from a audio signal by using bob.ap.Ceps(). To do so, several parameters can be precised by the user. Typically, these are precised in a configuration file. The following values are the default ones:
>>> win_length_ms = 20 # The window length of the cepstral analysis in milliseconds >>> win_shift_ms = 10 # The window shift of the cepstral analysis in milliseconds >>> n_filters = 24 # The number of filter bands >>> n_ceps = 19 # The number of cepstral coefficients >>> f_min = 0. # The minimal frequency of the filter bank >>> f_max = 4000. # The maximal frequency of the filter bank >>> delta_win = 2 # The integer delta value used for computing the first and second order derivatives >>> pre_emphasis_coef = 0.97 # The coefficient used for the pre-emphasis >>> dct_norm = True # A factor by which the cepstral coefficients are multiplied >>> mel_scale = True # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale
Once the parameters are precised, bob.ap.Ceps() can be called as follows:
>>> c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min, f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm) >>> signal = numpy.cast['float'](signal) # vector should be in **float** >>> mfcc = c(signal) >>> print len(mfcc) 199 >>> print len(mfcc) 19
LFCCs can be computed instead of MFCCs by setting mel_scale to False
>>> c.mel_scale = False >>> lfcc = c(signal)
User can also choose to extract the energy. This is typically used for Voice Activity Detection. Please check spkRecLib or FaceRecLib for more details about VAD.
>>> c.with_energy = True >>> lfcc_e = c(signal) >>> print len(lfcc_e) 199 >>> print len(lfcc_e) 20
It is also possible to compute first and second derivatives for those features:
>>> c.with_delta = True >>> c.with_delta_delta = True >>> lfcc_e_d_dd = c(signal) >>> print len(lfcc_e_d_dd) 199 >>> print len(lfcc_e_d_dd) 60