Commit
•
383378f
1
Parent(s):
e7c2018
Allow single quotes "'" and hyphens "-"
Browse filesRemove single quotes `'` (id 6) and hyphens `-` (id 12) from `suppress_tokens`. These tokens should **not** be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258
Check that we're removing the right tokens:
```python
from transformers import WhisperTokenizer
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-large")
print(tokenizer.decode(6))
print(tokenizer.decode(12))
```
**Print Output:**
```
'
-
```
- config.json +0 -2
config.json
CHANGED
@@ -46,12 +46,10 @@
|
|
46 |
"suppress_tokens": [
|
47 |
1,
|
48 |
2,
|
49 |
-
6,
|
50 |
7,
|
51 |
8,
|
52 |
9,
|
53 |
10,
|
54 |
-
12,
|
55 |
14,
|
56 |
25,
|
57 |
26,
|
|
|
46 |
"suppress_tokens": [
|
47 |
1,
|
48 |
2,
|
|
|
49 |
7,
|
50 |
8,
|
51 |
9,
|
52 |
10,
|
|
|
53 |
14,
|
54 |
25,
|
55 |
26,
|