This project is maintained by confUnknown
Supplementary material containing a selection of benign, adversarial, and noisy data employed in our paper.
For each sample, we include the word error rate (WER) or the character error rate (CER) as an accuracy metric and the segmental signal-to-noise ratio (SNRseg) as a quality noise metric. An SNRseg exceeding 0 dB indicates a stronger signal presence compared to noise. These samples are sourced from the Librispeech, Commonvoice, and Aishell corpus datasets.
Benign transcription: THEN HE LOOKED DOWN THE LAGOON WAS DRY Adversarial transcription: PEARL WAS A BORN OUTCAST OF THE INFANTILE WORLD
[benign: WER=0.00], [noisy: WER=62.50, SNRseg=-4.79]
[C&W adversarial: WER=0.00, SNRseg=24.50], [psychoacoustic: WER=0.00, SNRseg=25.36]
[adaptive adversarial: WER=0.00, SNRseg=-0.60]
Benign transcription: HOW JOLLY IT WAS BEING YOUNG HILDA Adversarial transcription: THERE WAS A GRIM SMILE OF AMUSEMENT ON HIS SHREWD FACE
[benign: WER=0.00], [noisy: WER=0.00, SNRseg=6.03]
[C&W adversarial: WER=0.00, SNRseg=22.04], [psychoacoustic: WER=0.00, SNRseg=22.95]
[adaptive adversarial: WER=0.00, SNRseg=-5.87]
Benign transcription: DAS HAT SCHON MEINE URGROßMUTTER GESAGT Adversarial transcription: NEU DELHI IST DIE HAUPTSTADT VON INDIEN
[benign: WER=0.00], [noisy: WER=0.00, SNRseg=-20.24]
[C&W adversarial: WER=0.00, SNRseg=9.05], [psychoacoustic: WER=0.00, SNRseg=11.72]
[adaptive adversarial: WER=0.00, SNRseg=-25.41]
Benign transcription: ICH GLAUBE ES AUCH NICHT Adversarial transcription: WAS SOLLS ICH BIN BEREIT
[benign: WER=0.00], [noisy: WER=20.00, SNRseg=-17.16]
[C&W adversarial: WER=0.00, SNRseg=8.86], [psychoacoustic: WER=0.00, SNRseg=11.23]
[adaptive adversarial: WER=0.00, SNRseg=-12.88]
Benign transcription: BERNARDO FALQUI PES Adversarial transcription: SEMPRE CARO MI FU QUEST'ERMO COLLE
[benign: WER=66.67], [noisy: WER=66.67, SNRseg=-3.39]
[C&W adversarial: WER=0.00, SNRseg=10.36], [psychoacoustic: WER=0.00, SNRseg=13.67]
[adaptive adversarial: WER=0.00, SNRseg=-12.54]
Benign transcription: PER ESSA È CONSIGLIATA UNA DURATA DI DUE SETTIMANE Adversarial transcription: LA LORO PRESENZA ERA PARTICOLARMENTE APPREZZATA DAL PASSEGGERI
[benign: WER=0.00], [noisy: WER=55.56, SNRseg=-11.31]
[C&W adversarial: WER=0.00, SNRseg=16.05], [psychoacoustic: WER=0.00, SNRseg=18.96]
[adaptive adversarial: WER=0.00, SNRseg=-6.73]
Benign transcription: 有 黑客 在 网络 上 兜售 车主 信 Adversarial transcription: 顶级 田径 赛事 再次 落户 鸟巢
[benign: CER=0.00], [noisy: CER=0.00, SNRseg=6.95]
[C&W adversarial: CER=0.00, SNRseg=21.39], [psychoacoustic: CER=0.00, SNRseg=25.20]
[adaptive adversarial: CER=0.00, SNRseg=-5.87]
Benign transcription: 加强 合作社 辅导 员 队伍 建设 Adversarial transcription: 发行 利率 也有 较大 幅度 上升
[benign: CER=8.33], [noisy: CER=25.00, SNRseg=1.05]
[C&W adversarial: CER=0.00, SNRseg=22.48], [psychoacoustic: CER=0.00, SNRseg=25.82]
[adaptive adversarial: CER=0.00, SNRseg=-2.87]