Leveraging characteristics of the output distribution for identifying adversarial audio examples

This project is maintained by confUnknown

Adversarial example demo

Supplementary material containing a selection of benign, adversarial, and noisy data employed in our paper.

For each sample, we include the word error rate (WER) or the character error rate (CER) as an accuracy metric and the segmental signal-to-noise ratio (SNRseg) as a quality noise metric. An SNRseg exceeding 0 dB indicates a stronger signal presence compared to noise. These samples are sourced from the Librispeech, Commonvoice, and Aishell corpus datasets.

Librispeech - English

Sample 1
Benign transcription:       THEN HE LOOKED DOWN THE LAGOON WAS DRY
Adversarial transcription:  PEARL WAS A BORN OUTCAST OF THE INFANTILE WORLD

  [benign: WER=0.00],               [noisy: WER=62.50, SNRseg=-4.79]

[C&W adversarial: WER=0.00, SNRseg=24.50],   [psychoacoustic: WER=0.00, SNRseg=25.36]

[adaptive adversarial: WER=0.00, SNRseg=-0.60]

Sample 2
Benign transcription:       HOW JOLLY IT WAS BEING YOUNG HILDA
Adversarial transcription:  THERE WAS A GRIM SMILE OF AMUSEMENT ON HIS SHREWD FACE

  [benign: WER=0.00],               [noisy: WER=0.00, SNRseg=6.03]

[C&W adversarial: WER=0.00, SNRseg=22.04],   [psychoacoustic: WER=0.00, SNRseg=22.95]

[adaptive adversarial: WER=0.00, SNRseg=-5.87]

Common Voice v.6 - German

Sample 1
Benign transcription:       DAS HAT SCHON MEINE URGROßMUTTER GESAGT
Adversarial transcription:  NEU DELHI IST DIE HAUPTSTADT VON INDIEN

  [benign: WER=0.00],               [noisy: WER=0.00, SNRseg=-20.24]

[C&W adversarial: WER=0.00, SNRseg=9.05],   [psychoacoustic: WER=0.00, SNRseg=11.72]

[adaptive adversarial: WER=0.00, SNRseg=-25.41]

Sample 2
Benign transcription:       ICH GLAUBE ES AUCH NICHT
Adversarial transcription:  WAS SOLLS ICH BIN BEREIT

  [benign: WER=0.00],               [noisy: WER=20.00, SNRseg=-17.16]

[C&W adversarial: WER=0.00, SNRseg=8.86],   [psychoacoustic: WER=0.00, SNRseg=11.23]

[adaptive adversarial: WER=0.00, SNRseg=-12.88]

Common Voice v.6 - Italian

Sample 1
Benign transcription:       BERNARDO FALQUI PES
Adversarial transcription:  SEMPRE CARO MI FU QUEST'ERMO COLLE

  [benign: WER=66.67],               [noisy: WER=66.67, SNRseg=-3.39]

[C&W adversarial: WER=0.00, SNRseg=10.36],   [psychoacoustic: WER=0.00, SNRseg=13.67]

[adaptive adversarial: WER=0.00, SNRseg=-12.54]

Sample 2
Benign transcription:       PER ESSA È CONSIGLIATA UNA DURATA DI DUE SETTIMANE
Adversarial transcription:  LA LORO PRESENZA ERA PARTICOLARMENTE APPREZZATA DAL PASSEGGERI

  [benign: WER=0.00],               [noisy: WER=55.56, SNRseg=-11.31]

[C&W adversarial: WER=0.00, SNRseg=16.05],   [psychoacoustic: WER=0.00, SNRseg=18.96]

[adaptive adversarial: WER=0.00, SNRseg=-6.73]

Aishell - Mandarin

Sample 1
Benign transcription:       有 黑客 在 网络 上 兜售 车主 信
Adversarial transcription:  顶级 田径 赛事 再次 落户 鸟巢

  [benign: CER=0.00],               [noisy: CER=0.00, SNRseg=6.95]

[C&W adversarial: CER=0.00, SNRseg=21.39],   [psychoacoustic: CER=0.00, SNRseg=25.20]

[adaptive adversarial: CER=0.00, SNRseg=-5.87]

Sample 2
Benign transcription:       加强 合作社 辅导 员 队伍 建设
Adversarial transcription:  发行 利率 也有 较大 幅度 上升

  [benign: CER=8.33],               [noisy: CER=25.00, SNRseg=1.05]

[C&W adversarial: CER=0.00, SNRseg=22.48],   [psychoacoustic: CER=0.00, SNRseg=25.82]

[adaptive adversarial: CER=0.00, SNRseg=-2.87]