Edit model card

Llama-3 chat vector

  • Update 0426: A small problem with the deployment of the model 'Llama-3-Seagull-Evo-8B', but we hope to have it back in good time!
  • Update 0526: Check our newest EMM model, Alpha-Ko-8B-Instruct

This is 'modelified' version of chat vector from the paper Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages. So this is not a model, its just weight diff, just for ease to use myself(or you too)!

What I understand here: 'Chat vector method' is a merging method that utilizes the difference between the base model, the continuously pre-trained (usually language transferred) model, and the chat model; so the recipe is

model(base) + weight_diff(continous pretrained) + weight_diff(instruct) or

model(base) + weight_diff(continous pretrained + fine-tuned) + weight_diff(instruct).

So before (my) initial purpose in comparing which method is better, llama3 → CP + chat vector → FT vs. llama3 → CP → FT + chat vector, it seems reasonable to compare it with other methods in Mergekit.

Model Method Kobest(f1) Haerae(acc)
beomi/Llama-3-Open-Ko-8B-Instruct-preview chat vector 0.4368 0.439
kuotient/Llama-3-Ko-8B-ties Ties 0.4821 0.5160
kuotient/Llama-3-Ko-8B-dare-ties Dare-ties 0.4950 0.5399
kuotient/Llama-3-Ko-8B-TA Task Arithmetic(maybe...? not sure about this) -
WIP Model stock(I don't read this paper yet but still) -
kuotient/Llama-3-Seagull-Evo-8B Evolutionary Model Merging 0.6139 0.5344
--- --- --- ---
meta-llama/Meta-Llama-3-8B Base - -
meta-llama/Meta-Llama-3-8B-Instruct - 0.4239 0.4931
beomi/Llama-3-Open-Ko-8B Korean Base 0.4374 0.3813

All that aside, I'd like to thank @beomi for creating such an awesome korean-based model.

Downloads last month
18
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including kuotient/Llama-3-8B-Instruct-vector-diff