Spoof detection using time-delay shallow neural network and feature switching
Front-end features
In the recent literature related to ASV-spoof-2015 and ASV-spoof-2017 challenges, numerous works have been proposed with the focus on the feature extraction module . The best feature is expected to capture the spectral variations among the spoofed and bonafide trials. Most of these works use a simple GMM classifier at the back-end. Initially, like many neural network system, spectrogram of the utterances were given as the input to the x-vector architecture. However, the performance of the system with spectrogram input is worse. Hence, in this work, a set of features were chosen to build a spoof detection system with both GMM and x-vector classifier.
Constant-Q cepstral coefficients (CQCC) proposed in is known to be a robust feature in detecting the logical attacks. Linear frequency cepstral coefficients (LFCC) is shown to have better performance in case of the reverberant speech . CQCC and LFCC features are given as the baseline features for ASV-spoof-2019 challenge. Along with CQCC and LFCC, we explore two other cepstral features namely Mel frequency cepstral coefficients (MFCC) and inverse Mel frequency cepstral coefficients (IMFCC). Inferences in ASV-spoof-2017 showed that the cepstral features are too coarse to detect the imperceptible spectral differences in the spoofed utterances . Hence, along with these cepstral coefficients, we also explore the effects of filterbank energies as features. Three filterbank energies namely linear frequency filterbank energy (LFBE), Mel frequency filterbank energy (MFBE) and Inverse Mel frequency filterbank energy (IMFBE) were chosen for the purpose.
| System | Metric | Overall | Type of physical attack conditions | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5-13 | Performance | AA | AB | AC | BA | BB | BC | CA | CB | CC | ||
| 2-13 | CQCC | EER | 11.04 | 25.28 | 6.16 | 2.13 | 21.87 | 5.26 | 1.61 | 21.10 | 4.70 | 1.79 |
| t-DCF | 0.25 | 0.50 | 0.18 | 0.05 | 0.47 | 0.15 | 0.05 | 0.50 | 0.14 | 0.05 | ||
| 3-13 | LFCC | EER | 13.54 | 32.48 | 4.40 | 3.95 | 24.59 | 4.29 | 3.20 | 21.63 | 3.92 | 3.06 |
| t-DCF | 0.30 | 0.74 | 0.13 | 0.11 | 0.60 | 0.13 | 0.09 | 0.55 | 0.12 | 0.09 | ||
| Primary | EER | 11.28 | 27.57 | 7.86 | 3.50 | 20.48 | 6.36 | 2.78 | 18.44 | 5.74 | 3.07 | |
| t-DCF | 0.28 | 0.52 | 0.22 | 0.09 | 0.46 | 0.19 | 0.08 | 0.46 | 0.17 | 0.09 | ||
| Contrastive-1 | EER | 9.33 | 23.94 | 4.42 | 2.60 | 18.98 | 4.19 | 2.25 | 16.12 | 4.09 | 2.34 | |
| t-DCF | 0.23 | 0.48 | 0.13 | 0.07 | 0.42 | 0.13 | 0.06 | 0.40 | 0.13 | 0.07 | ||
| Contrastive-2 | EER | 11.34 | 27.39 | 8.71 | 4.22 | 19.96 | 7.53 | 3.43 | 17.88 | 6.86 | 3.58 | |
| t-DCF | 0.31 | 0.54 | 0.26 | 0.11 | 0.47 | 0.23 | 0.09 | 0.46 | 0.21 | 0.10 | ||
| Single | EER | 12.10 | 17.87 | 8.88 | 5.76 | 19.53 | 9.56 | 5.93 | 21.87 | 10.33 | 6.60 | |
| t-DCF | 0.31 | 0.42 | 0.25 | 0.16 | 0.48 | 0.28 | 0.17 | 0.51 | 0.30 | 0.19 | ||