Bob 2.0 extraction of cepstral features (MFCC or LFCC) from audio

This algorithm is a **legacy** one. The API has changed since its implementation. New versions and forks will need to be updated.

This algorithm is **splittable**

Endpoint Name | Data Format | Nature |
---|---|---|

speech | system/array_1d_floats/1 | Input |

vad | system/array_1d_integers/1 | Input |

features | system/array_2d_floats/1 | Output |

Parameters allow users to change the configuration of an algorithm when scheduling an experiment

Name | Description | Type | Default | Range/Choices |
---|---|---|---|---|

f_max | Max frequency of the range used in bandpass filtering | float64 | 8000.0 | |

delta_win | Window size used in delta and delta-delta computation | uint32 | 2 | |

withDelta | Compute deltas (with window size specified by delta_win) | bool | True | |

pre_emphasis_coef | Pre-emphasis coefficient | float64 | 0.95 | |

win_shift_ms | The length of the overlap between neighboring windows. Typically the half of window length. | float64 | 10.0 | |

win_length_ms | The length of the sliding processing window, typically about 20 ms | float64 | 20.0 | |

dct_norm | Use normalized DCT | bool | False | |

normalizeFeatures | Normalize computed Cepstral features (shift by mean and divide by std) | bool | True | |

filter_frames | Filter frames with computed Cepstral features based on the VAD labels. Either trim out silence head/tails, keep only speech, or keep only silence. | string | trim_silence | trim_silence, silence_only, speech_only |

rate | Sampling rate of the speech signal | float64 | 16000.0 | |

n_filters | Number of filter bands | uint32 | 24 | |

f_min | Min frequency of the range used in bandpass filtering | float64 | 0.0 | |

withDeltaDelta | Compute delta-deltas (with window size specified by delta_win) | bool | True | |

withEnergy | Use power of the FFT magnitude, otherwise just an absolute value of the magnitude | bool | True | |

mel_scale | Set true to use Mel-scaled triangular filter, otherwise it's a linear scale | bool | True | |

n_ceps | Number of cepstral coefficients | uint32 | 19 |

Extract cepstral features (MFCC or LFCC) from audio

Updated | Name | Databases/Protocols | Analyzers | |||
---|---|---|---|---|---|---|

pkorshunov/pkorshunov/isv-asv-pad-fusion-complete/1/asv_isv-pad_lbp_hist_ratios_lr-fusion_lr-pa_aligned | avspoof/2@physicalaccess_verify_train,avspoof/2@physicalaccess_verification,avspoof/2@physicalaccess_verification_spoof,avspoof/2@physicalaccess_verify_train_spoof,avspoof/2@physicalaccess_antispoofing | pkorshunov/spoof-score-fusion-roc_hist/1 | ||||

pkorshunov/pkorshunov/isv-asv-pad-fusion-complete/1/asv_isv-pad_gmm-fusion_lr-pa | avspoof/2@physicalaccess_verify_train,avspoof/2@physicalaccess_verification,avspoof/2@physicalaccess_verification_spoof,avspoof/2@physicalaccess_verify_train_spoof,avspoof/2@physicalaccess_antispoofing | pkorshunov/spoof-score-fusion-roc_hist/1 | ||||

pkorshunov/pkorshunov/speech-pad-simple/1/speech-pad_gmm-pa | avspoof/2@physicalaccess_antispoofing | pkorshunov/simple_antispoofing_analyzer/4 | ||||

pkorshunov/pkorshunov/isv-speaker-verification-spoof/1/isv-speaker-verification-spoof-pa | avspoof/2@physicalaccess_verification_spoof,avspoof/2@physicalaccess_verification | pkorshunov/eerhter_postperf_iso_spoof/1 | ||||

pkorshunov/pkorshunov/isv-speaker-verification/1/isv-speaker-verification-licit | avspoof/2@physicalaccess_verification | pkorshunov/eerhter_postperf_iso/1 |

