JingweiZuo commited on
Commit
4493883
1 Parent(s): bb01ae9

Fix: encoder and decoder in tokenizer

Browse files

Hi!

When evaluating the RWKV-v5-Eagle-7B-HF model, I found the errors as shown below. This is mostly caused by the tokenizer. In the code, the encoder and decoder are reversed, as discussed here https://huggingface.co/RWKV/v5-Eagle-7B-HF/discussions/9#65d4dc35f9cbfa798c4be4b3

![Evaluation error on RWKV-v5-Eagle-7B-HF](https://cdn-uploads.huggingface.co/production/uploads/6460c3811db65f878513bcaf/NIQHGPQ7lmSRvkrY4SAax.png)

This PR is raised to fix this issue.
Thanks,

Files changed (1) hide show
  1. tokenization_rwkv_world.py +2 -2
tokenization_rwkv_world.py CHANGED
@@ -106,11 +106,11 @@ class RWKVWorldTokenizer(PreTrainedTokenizer):
106
  assert isinstance(x, bytes)
107
  assert len(x) == int(l[l.rindex(" ") :])
108
  sorted += [x]
109
- self.encoder[idx] = x
110
 
111
  self.decoder = {}
112
  for k, v in self.encoder.items():
113
- self.decoder[v] = int(k)
114
 
115
  self.trie = TRIE()
116
  for t, i in self.decoder.items():
 
106
  assert isinstance(x, bytes)
107
  assert len(x) == int(l[l.rindex(" ") :])
108
  sorted += [x]
109
+ self.encoder[x] = idx
110
 
111
  self.decoder = {}
112
  for k, v in self.encoder.items():
113
+ self.decoder[v] = k
114
 
115
  self.trie = TRIE()
116
  for t, i in self.decoder.items():