Edwin Rijgersberg
commited on
Commit
•
93d00ba
1
Parent(s):
0837ce3
Fix mixup of `<pad>` and `<s>` tokens in vocab
Browse filesWhen using this model, it outputs many `<s>`-tokens, including in the middle of words. You can observe this by running locally, or by using the widget on this page.
It seems to be fixed by switching the vocab ids of `<s>` and `<pad>`.
Other GroNLP-models also seem affected by this, for example https://huggingface.co/GroNLP/wav2vec2-dutch-large-ft-cgn
- vocab.json +1 -1
vocab.json
CHANGED
@@ -1 +1 @@
|
|
1 |
-
{"<
|
|
|
1 |
+
{"<pad>": 0, "<s>": 1, "</s>": 2, "<unk>": 3, "|": 4, "A": 5, "B": 6, "C": 7, "D": 8, "E": 9, "F": 10, "G": 11, "H": 12, "I": 13, "J": 14, "K": 15, "L": 16, "M": 17, "N": 18, "O": 19, "P": 20, "Q": 21, "R": 22, "S": 23, "T": 24, "U": 25, "V": 26, "W": 27, "X": 28, "Y": 29, "Z": 30, "È": 31, "É": 32, "Ë": 33, "?": 34, "'": 35, "-": 36}
|