Ollama modelfile

#1
by pesonen - opened

Model seems to output [/INST] at the beginning of the response when GGUF file is loaded to ollama with minimum modelfile. Also output seem to be quite random occasionally. Would it be possible to have some pointers to instructions on how to create Ollama modelfiles for these smaller models?

Model seems to output [/INST] at the beginning of the response when GGUF file is loaded to ollama with minimum modelfile. Also output seem to be quite random occasionally. Would it be possible to have some pointers to instructions on how to create Ollama modelfiles for these smaller models?

I am facing this same issue. Also the prompt format would be useful to know.

I can't help with ollama, and for prompt format, these questions should probably go to the original model. However, from looking at the chat template, the prompt format should be llama 2 (which also explains that [/INST]).

Thanks!

@pesonen , did you manage to find out what was wrong? I am facing the same issue with valid llama2 template applied. I've tried quite many attempts with different variations to the formatting and none of them has been successful. [INST] or [/INST] is almost always present in output.

You should be able to use the model directly from this repo and it should have template correctly set based on the tokenizer.chat_template property
https://huggingface.co/docs/hub/ollama

But it might still be wrong somehow at least based on my fast testing:

image.png

Here is the Ollama based way of defining template (Created with o1-preview based on documentation from Ollama and our tokenizer chat template so might contain errors)
'''
{{- $bos_token := "" }}
{{- $eos_token := "
" }}
<>
{{- if .System }}
{{ .System }}
{{- else if and (gt (len .Messages) 0) (eq ((index .Messages 0).Role) "system") }}
{{ (index .Messages 0).Content }}
{{- else }}
Olet tekoälyavustaja. Vastaat aina mahdollisimman avuliaasti. Vastauksesi eivät saa sisältää mitään haitallista, epäeettistä, rasistista, seksististä, vaarallista tai laitonta sisältöä. Jos kysymyksessä ei ole mitään järkeä tai se ei ole asiasisällöltään johdonmukainen, selitä miksi sen sijaan, että vastaisit jotain väärin. Jos et tiedä vastausta kysymykseen, älä kerro väärää tietoa.
{{- end }}
<>
{{- range $index, $message := .Messages }}
{{- if and (eq $index 0) (eq $message.Role "system") }}
{{- /* Skip the system message already processed */ }}
{{- else }}
{{- if eq $message.Role "user" }}
{{- if and (eq $index 1) (eq ((index .Messages 0).Role) "system") }}
{{- $content := printf "<>\n%s\n<>\n\n%s" ((index .Messages 0).Content) $message.Content }}
{{ printf "%s [INST] %s [/INST]" $bos_token $content }}
{{- else }}
{{ printf "%s [INST] %s [/INST]" $bos_token $message.Content }}
{{- end }}
{{- else if eq $message.Role "assistant" }}
{{ printf " %s%s" $message.Content $eos_token }}
{{- else }}
{{ error "Conversation roles must alternate between 'user' and 'assistant'." }}
{{- end }}
{{- end }}
{{- end }}
'''

I tried also to run the model directly from repo and thought that ollama is missing the template for it, but as you said gguf should contain it already in tokenizer.

But thanks! I will check out your template in the evening 👍

@pesonen , did you manage to find out what was wrong? I am facing the same issue with valid llama2 template applied. I've tried quite many attempts with different variations to the formatting and none of them has been successful. [INST] or [/INST] is almost always present in output.

With new Ollama support for Huggingface GGUF files [INST] tags have disappeared but answers are not great otherwise. Model (original or quantized) is not usable for us.

Ollama show ... --modelfile

TEMPLATE "{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>

I might later on try to put more attention to these Ollama inference things but for now my focus goes to Ahma-7B-Instruct finetunes. Let me know if someone finds some solution. Our original repo shows how inference works with transformers and you can use tokenizer.apply_chat_template(messages, tokenize=False) to see how it generates data for inference

Sign up or log in to comment