ChatML template does not work properly

#2
by WasamiKirua - opened

The models hallucinates a lot. I am using the GGUF model and ML Studio, selected the ChatML prompt template based on Model Card

@WasamiKirua not sure about the quality of the quantized versions, I'd recommend that you load the model weights in bfloat16.

Here's a colab notebook with a chat interface, you can use this to interact with the chat model.

https://huggingface.co/rasyosef/Mistral-NeMo-Minitron-8B-Chat/blob/main/Mistral_NeMo_Minitron_8B_chatbot.ipynb

you are using chatml , use phi-2 template for chatting , the author mentioned that in the original model model card, https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Base/discussions/5#66cbf507ed5c5babdef42cd1

if you use the wrong chat template with gguf , the model hallucinates

Yeah, have to use the chat template supported by the model's tokenizer. In this case, it's chatml. It's the same template as in the model card.

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
How to explain Internet for a medieval knight?<|im_end|>
<|im_start|>assistant

Also make sure of the gguf you are using , lower quants tend to hallucinate alot , also set the temp and top p and top k to the right amounts

hmmm i have just tried the notebook provided and, this is the output, where the input was just: "Ciao", meaning "Hi". What I have captured is not even all.

Screenshot_20240826_204209.png

The SFT dataset only contained English instructions, can you try English prompts? This looks like Italian.

with English text is fine. The Minitron is a multilanguage, I dunno how much is the italian data in % but based on my experience the Mistral Nemo is able to generate Italian text pretty well. Perhaps, since you fine tuned it with Openhermes it is all about "catastrophic forget" ? but I am pretty sure OpenHermes has also Italian text. No idea honestly. Anyway did you use axolotl to fine tune ? if yes might I ask you the configuration yaml ?

The chat model is still able to generate Italian text. Catastrophic forgetting mainly occurs when you use full parameter finetuning. Since I used LoRA, there's little chance of that happening.

Here's an example:

Instruction: "Quali sono le destinazioni turistiche più apprezzate in Italia?"
Response: "Le destinazioni turistiche più apprezzate in Italia dipendono da una varietà di fattori, tra cui la storia, la cultura, la natura e la gastronomia. Alcune delle destinazioni turistiche più popolari in Italia includono: Roma, con la sua ricca storia e monumenti antichi; Venezia, famosa per le sue gondole e canali; Firenze, con il suo patrimonio artistico rinascimentale; Milano, con la sua architettura moderna e la moda; Napoli, con la sua storia e la sua cucina; e la Costa Amalfitana, con le sue spettacolari viste panoramiche. Ci sono anche molte altre destinazioni turistiche popolari in Italia, come Pompei, la Sardegna, la Sicilia, il Piemonte, la Toscana e la Liguria. In ogni regione di Italia, ci sono molte destinazioni turistiche uniche e interessanti da scoprire."

Same for me sadly, using imatrix quants, q6, the model always responds off the mark, it's being very dumb especially in E/RP...

Hi @Varkoyote , can you try the same prompts with the unquantized bfloat16 version?

Here's a colab notebook with a chat interface.
https://huggingface.co/rasyosef/Mistral-NeMo-Minitron-8B-Chat/blob/main/Mistral_NeMo_Minitron_8B_chatbot.ipynb

Sign up or log in to comment