Chat template
In the model card, you list the chat template as:
System: {System}
{Context}
User: {Question}
Assistant: {Response}
User: {Question}
Assistant:
However in the tokenizer_config.json it's:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
which is correct? I assume the one in the model card..
The sample code from the README file seems to be aligned with the model card.
Sorry about the confusion here. the chat template listed in our model card is correct. We simply use the tokenizer_config.json from Llama3, it comes with their chat template. We have updated this tokenizer_config.json and removed it.
i am using the eos_token ("<|end_of_text|>") as the pad_token.
@zihanliu
I fine tune the model, same is you suggest the chat template.
I did the same, to assign the eos token to as pad. Like this
Tokenizer.pad_token=Tokenizer.eos_token.
But the model some time add the eos token but for long answer the model is sucks to add.
Is this bad approach to do pad=EOS?
OR should I need to add the pad token to tokenizer and update the embadding too.
Check this chat template.
"chat_template": "{% for message in messages %}{% if message['role'] == 'system' %}{{ bos_token + 'System: ' + message['content'] }}{% elif message['role'] == 'user' %}{{ '\n\nUser: ' + message['content'] + eos_token }}{% elif message['role'] == 'assistant' %}{{ '\n\nAssistant: ' + message['content'] + eos_token }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '\n\nAssistant: ' }}{% endif %}",
Hi. I checked your chat template. The eos_token shouldn't be added after the User turn or Assistant turn. During fine-tuning, We use eos_token as the pad_token, and it is added at the end of the sequence to make sure each sequence in a batch has same length.
@zihanliu
so in your point of view, we don't need to add the eos token and bos token. Will the model pack up Long sequence in the data to be pad ?
Let say, I have data set like this
[ " Ai assistant"]
[" Ai friendly assistant BLA BLA "]
[" ABC"]
Dose the tokenizer will choose Long sequence randomly and the remaining will be pad?
I mean how the tokenizer will decide to choose all batch with same length.
Last question, by doing this, I am afraid the model will stuck/sucks to add the eos token or end the conversation.
Some time the model don't know how to stop the answer.
Thank you.
Hi, Let me try to answer your questions as follows:
- You need to set a maximum sequence length (4k/8k). Then, tokenizer will pad all sequences to the maximum length. If the sample is longer than maximum sequence length, it will be cut.
- Model will be trained to generate the eos_token when the output is finished. However, for the padding tokens, we will set loss_mask as 0 to make sure the padding tokens will not be trained
- We do need bos_token at the beginning of a sequence, which is the same as llama3 models.
Hope these can help :)
@zihanliu
the point 2 are not clear.
How to set the loss_mask to 0?
when i use the vllm , waring :No chat template provided. Chat API will not work.
and the result blew:
Q: is hi
R : is
hi
<|im_end|>
<|im_start|>user
hi<|im_end|>
<|im_start|>assistant
<|im_end|>
<|im_start|>user
hi<|im_end|>
<|im_start|>assistant
<|im_end|>
<|im_start|>user
hi<|im_end|>
<|im_start|>assistant
<|im_end|>
<|im_start|>user
.......
Why?
@zihanliu
Hy, I hope you will good. I train the model. The model is adding the eos token and not stuck in response.
But, the model return very small answer. Just a 30 or 25 token..
However my dataset consist of 250 token to 1000..
I increase the new token length 1000, 2000 in generation config. But the model still produce 30, or 25 token.
Can you explain why this happen?
Hi
@Imran1
,
ChatQA is trained to provide full but concise response to the question. How large is your dataset? If it is small, the model might still follow its original output format. Also, it depends on the question, some questions do not necessary need a very long answer output.