Qwen/Qwen2-7B · TypeError: 'NoneType' object cannot be interpreted as an integer

Jun 15

Hi Qwen2 team,

I am trying to run Zephyr DPO recipe (https://github.com/huggingface/alignment-handbook/tree/main/recipes/zephyr-7b-beta) to fine-tune this model but consistently running into this error. (The SFT training works fine). Does this model use a special checkpoint configuration I need to configure? Any thoughts on the potential reason?

" [rank6]: TypeError: 'NoneType' object cannot be interpreted as an integer
[rank5]: Traceback (most recent call last):
[rank5]: File "/home/litan/alignment-handbook/scripts/run_dpo.py", line 261, in
[rank5]: main()
[rank5]: File "/home/litan/alignment-handbook/scripts/run_dpo.py", line 214, in main
[rank5]: train_result = trainer.train(resume_from_checkpoint=checkpoint)
[rank5]: File "/opt/conda/envs/handbook/lib/python3.10/site-packages/transformers/trainer.py", line 1850, in train
[rank5]: return inner_training_loop(
[rank5]: File "/opt/conda/envs/handbook/lib/python3.10/site-packages/transformers/trainer.py", line 2165, in _inner_training_loop
[rank5]: for step, inputs in enumerate(epoch_iterator):
[rank5]: File "/opt/conda/envs/handbook/lib/python3.10/site-packages/accelerate/data_loader.py", line 454, in iter
[rank5]: current_batch = next(dataloader_iter)
[rank5]: File "/opt/conda/envs/handbook/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
[rank5]: data = self._next_data()
[rank5]: File "/opt/conda/envs/handbook/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
[rank5]: data = self._dataset_fetcher.fetch(index) # may raise StopIteration
[rank5]: File "/opt/conda/envs/handbook/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
[rank5]: return self.collate_fn(data)
[rank5]: File "/opt/conda/envs/handbook/lib/python3.10/site-packages/trl/trainer/utils.py", line 338, in call
[rank5]: to_pad = [torch.LongTensor(ex[k]) for ex in features]
[rank5]: File "/opt/conda/envs/handbook/lib/python3.10/site-packages/trl/trainer/utils.py", line 338, in
[rank5]: to_pad = [torch.LongTensor(ex[k]) for ex in features]
[rank5]: TypeError: 'NoneType' object cannot be interpreted as an integer
[2024-06-15 02:51:57,401] [INFO] [utils.py:802:see_memory_usage] After initializing ZeRO optimizer"

tanliboy

Jun 18

•

edited Jun 21

In case anyone runs into the same problem, I figured out it is related the inconsistence between bos_token_id and bos_token.
I worked around it by changing
"bos_token": null to be "bos_token": <|endoftext|> in the tokenizer_config.json file.

jklj077

Qwen org Aug 14

please also refer to this comment. it is not needed to change the config file after the related PR in trl.

jklj077 changed discussion status to closed Aug 14