Ollama modelfile

by pesonen - opened Aug 19

Aug 19

•

Model seems to output [/INST] at the beginning of the response when GGUF file is loaded to ollama with minimum modelfile. Also output seem to be quite random occasionally. Would it be possible to have some pointers to instructions on how to create Ollama modelfiles for these smaller models?

RaakaArska

Sep 1

Model seems to output [/INST] at the beginning of the response when GGUF file is loaded to ollama with minimum modelfile. Also output seem to be quite random occasionally. Would it be possible to have some pointers to instructions on how to create Ollama modelfiles for these smaller models?

I am facing this same issue. Also the prompt format would be useful to know.

mradermacher

Owner Sep 2

I can't help with ollama, and for prompt format, these questions should probably go to the original model. However, from looking at the chat template, the prompt format should be llama 2 (which also explains that [/INST]).

pesonen

Sep 2

Thanks!

Shards86

about 1 month ago

@pesonen , did you manage to find out what was wrong? I am facing the same issue with valid llama2 template applied. I've tried quite many attempts with different variations to the formatting and none of them has been successful. [INST] or [/INST] is almost always present in output.

RASMUS

about 1 month ago

You should be able to use the model directly from this repo and it should have template correctly set based on the tokenizer.chat_template property
https://huggingface.co/docs/hub/ollama

RASMUS

about 1 month ago

But it might still be wrong somehow at least based on my fast testing:

RASMUS

about 1 month ago

•

edited about 1 month ago

Here is the Ollama based way of defining template (Created with o1-preview based on documentation from Ollama and our tokenizer chat template so might contain errors)
'''
{{- $bos_token := "" }}
{{- $eos_token := "" }}
<>
{{- if .System }}
{{ .System }}
{{- else if and (gt (len .Messages) 0) (eq ((index .Messages 0).Role) "system") }}
{{ (index .Messages 0).Content }}
{{- else }}
Olet tekoälyavustaja. Vastaat aina mahdollisimman avuliaasti. Vastauksesi eivät saa sisältää mitään haitallista, epäeettistä, rasistista, seksististä, vaarallista tai laitonta sisältöä. Jos kysymyksessä ei ole mitään järkeä tai se ei ole asiasisällöltään johdonmukainen, selitä miksi sen sijaan, että vastaisit jotain väärin. Jos et tiedä vastausta kysymykseen, älä kerro väärää tietoa.
{{- end }}
<>
{{- range $index, $message := .Messages }}
{{- if and (eq $index 0) (eq $message.Role "system") }}
{{- /* Skip the system message already processed */ }}
{{- else }}
{{- if eq $message.Role "user" }}
{{- if and (eq $index 1) (eq ((index .Messages 0).Role) "system") }}
{{- $content := printf "<>\n%s\n<>\n\n%s" ((index .Messages 0).Content) $message.Content }}
{{ printf "%s [INST] %s [/INST]" $bos_token $content }}
{{- else }}
{{ printf "%s [INST] %s [/INST]" $bos_token $message.Content }}
{{- end }}
{{- else if eq $message.Role "assistant" }}
{{ printf " %s%s" $message.Content $eos_token }}
{{- else }}
{{ error "Conversation roles must alternate between 'user' and 'assistant'." }}
{{- end }}
{{- end }}
{{- end }}
'''

Shards86

30 days ago

I tried also to run the model directly from repo and thought that ollama is missing the template for it, but as you said gguf should contain it already in tokenizer.

But thanks! I will check out your template in the evening 👍

pesonen

30 days ago

•

edited 29 days ago

@pesonen , did you manage to find out what was wrong? I am facing the same issue with valid llama2 template applied. I've tried quite many attempts with different variations to the formatting and none of them has been successful. [INST] or [/INST] is almost always present in output.

With new Ollama support for Huggingface GGUF files [INST] tags have disappeared but answers are not great otherwise. Model (original or quantized) is not usable for us.

Ollama show ... --modelfile

TEMPLATE "{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>

RASMUS

28 days ago

•

edited 27 days ago

I might later on try to put more attention to these Ollama inference things but for now my focus goes to Ahma-7B-Instruct finetunes. Let me know if someone finds some solution. Our original repo shows how inference works with transformers and you can use tokenizer.apply_chat_template(messages, tokenize=False) to see how it generates data for inference

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment