Fixes evaluation instructions and updates WER scores

by andreagasparini - opened Jul 25, 2022

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-6

andreagasparini

Jul 25, 2022

•

edited Jul 25, 2022

Hi, I was trying to evaluate the model on LibriSpeech's "clean" and "other" test data following the code snippet in the Model card but I got a TypeError due to storing the transcriptions in the batch as wrapped in lists instead of as plain strings (e.g. ["transcription example"] instead of "transcription example") in the map_to_pred function.

TypeError: expected string or bytes-like object

After fixing the error I recomputed the WER and updated the scores without aproximating them. I think the same should be done for other wav2vec2 based models (e.g. facebook/wav2vec2-large-960h-lv60).

Fixes evaluation instructions and updates WER scoresd02ba4fe

Fixes typo25ca1d9d

lysandre

Jul 28, 2022

cc @sanchit-gandhi

sanchit-gandhi

Jul 28, 2022

Thanks for the catch @andreagasparini ! I'll run the updated script to verify the results. If they match we can merge 💪 I'll also look into updating the other W2V2-based models that share this example script bug.

sanchit-gandhi

Jul 28, 2022

Thanks for the bug fix - I can verify that the script works and that I get the same results. I would advocate for keeping the change in the evaluation script (fixing the TypeError in L113) but discarding the ones that update the WER metrics (L27, 41, 116). The reason being that the Wav2Vec2 paper and "official" results are two 1 decimal place (1.9/3.9), and it is the convention in speech literature is to quote WER results to 1 decimal place (WERs of 1.9/3.9 vs 1.86/3.88). Keeping the results to 1 dp. Note that by quoting to 1 d.p., we leave at most a 0.05% uncertainty in our WER metrics, which is tiny for all intensive purposes!

Hope that makes sense! Let me know if you have any questions!

andreagasparini

Aug 9, 2022

•

edited Aug 9, 2022

Hi @sanchit-gandhi , I agree with your reasons on keeping the results to 1 decimal place, but at the same time it seems that on the Speech Bench quite all the other models do not follow the same convention for approximation (they seem to be quoted to 2 d.p.).

Should we change all the others or just this one, that's my dilemma 😂

sanchit-gandhi

Aug 12, 2022

I see your point! I'd be in favour of quoting to 1 d.p. on the Speech Bench. We can open this as a discussion!

sanchit-gandhi

Aug 12, 2022

Speech bench discussion: https://huggingface.co/spaces/huggingface/hf-speech-bench/discussions/1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment