How to use this weight file with pure `transformer` code?
I manually fix the issue so that it can be loaded to huggingface transformers model
@csegalin
https://huggingface.co/Seungyoun/llava-llama-3-8b-hf
It would be great if you provide chat template you used to train the model also thanks you for your wonderful work @LZHgrla
Hi
@Seungyoun
This model follows the format of official llava-v1.5/v1.6. and is not in the format of LlavaForConditionalGeneration.
We will provide a conversion script in about 1-2 day's, to convert this model to LlavaForConditionalGeneration model.
Before that, please use the cli or lmdeploy, as https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf#quickstart
Thank you for your prompt response. @LZHgrla
This model follows the format of official llava-v1.5/v1.6. and is not in the format of LlavaForConditionalGeneration.
We will provide a conversion script in about 1-2 day's, to convert this model to LlavaForConditionalGeneration model.
I need to clarify something: in your message, you mention that this model follows the format of the official LLaVA-v1.5/v1.6 and is not directly in the LlavaForConditionalGeneration format. However, my understanding is that LLaVA-v1.5/v1.6 corresponds to what is known as LlavaNextForConditionalGeneration. Could you confirm if this is the case?
Good question! There are so many formats for llama model.
Here are two examples :
https://huggingface.co/liuhaotian/llava-v1.5-7b/tree/main is llava format
https://huggingface.co/llava-hf/llava-1.5-7b-hf/tree/main is hf format
This model is in llava format, although it has a -hf suffix.
@LZHgrla
I am trying to manually fixing the model weight index mapping to proper structual and config.json
Is your model also followes this added_tokens.json
?
{
"<image>": 32000,
"<pad>": 32001
}
The original vocab size is 128256
So, I think the correct token ids should be
{
"<image>": 128257,
"<pad>": 128258
}
were you able to make it work?
using the cli version I get issue with the transformer 4.37 required by llava while all these new models work with at least 4.39
I manually fix the issue so that it can be loaded to huggingface transformers model
@csegalinhttps://huggingface.co/Seungyoun/llava-llama-3-8b-hf
It would be great if you provide chat template you used to train the model also thanks you for your wonderful work @LZHgrla
I added an issue in your repo.
@LZHgrla any update on the conversion script?
@csegalin
We will release pure transformers and gguf version models and the corresponding conversion scripts within a few days.
Before that, welcome to try our newly released llava-phi-3-mini model, which has multiple format supports, including the official llava format, pure transformers format and gguf format.
https://huggingface.co/xtuner/llava-phi-3-mini-hf
I manually fix the issue so that it can be loaded to huggingface transformers model
@csegalinhttps://huggingface.co/Seungyoun/llava-llama-3-8b-hf
It would be great if you provide chat template you used to train the model also thanks you for your wonderful work @LZHgrla
Thanks! We use the llama-3's chat-template to train llava-llama-3-8b models.
That is,
"chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",
An image-text example is
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
<image>What is it?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
AAAAAAAAA<|eot_id|>
<|start_header_id|>user<|end_header_id|>
Do you like it?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
@csegalin
@Seungyoun
Hi!
Here are the pure transformers model and gguf model!
https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers
https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf
Hei thanks
I tried yesterday and not sure why but performance is worse than the first version. A lot of repeated words, less accurate even if using same generation parameter