This is amazing! Did you use my merge method?

#2
by rombodawg - opened

This is really amazing that you were able to top my model. I see that you used mergekit to make this

https://huggingface.co/fblgit/cybertron-v4-qw7B-MGS/blob/main/model.safetensors.index.json

Did you use my merge method to create this (Continuous Finetuning) after training your model.

Id imagine your merge would have looked something like this

models:
  - model: Qwen_Qwen2.5-7B-Instruct
    parameters:
      weight: 1
      density: 1
  - model: Qwen_Qwen2.5-7B-Magpie-Qwen2.5-Pro-1M-v0.1-Tuned
    parameters:
      weight: 1
      density: 1
merge_method: ties
base_model: Qwen_Qwen2.5-7B
parameters:
  weight: 1
  density: 1
  normalize: true
  int8_mask: true
dtype: bfloat16

nah brother.. if i were to use that.. u wold see it in the README.. on the citations part.
Unfortunately, I haven't been able to prove your continous theory.. just like your results.. the result of doing that is a turd.

@fblgit Im not sure what you mean by " just like your results.. the result of doing that is a turd" as my results have only improved the models performance.

Its clear you used mergekit to create this model by your "model.safetensors.index.json", can you at least share what you did using mergekit after tuning?

mm.. fair question. and since u were open on it.. i'll be doing similarly:
https://arxiv.org/pdf/2410.21228

U can see SFT vs LoRA differences.
Corpora forgetful impact.
Think how u can tackle that with mergekit using ur own SFT loras.

fblgit changed discussion status to closed

Sign up or log in to comment