Llama-3-8b-ita-slerp

This is a merge of pre-trained language models created using mergekit.

I tried to merge two of the best Italian LLMs using Mergekit. The results are acceptable, but I could not improve on the best existing model.

Evaluation

For a detailed comparison of model performance, check out the Leaderboard for Italian Language Models.

Here's a breakdown of the performance metrics:

Metric	hellaswag_it acc_norm	arc_it acc_norm	m_mmlu_it 5-shot acc	Average
Accuracy Normalized	0.6879	0.5714	0.5732	0.6109

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:


slices:
- sources:
  - model: swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
    layer_range:
    - 0
    - 32
  - model: DeepMount00/Llama-3-8b-Ita
    layer_range:
    - 0
    - 32
merge_method: slerp
base_model: swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
parameters:
  t:
  - filter: self_attn
    value:
    - 0
    - 0.5
    - 0.3
    - 0.7
    - 1
  - filter: mlp
    value:
    - 1
    - 0.5
    - 0.7
    - 0.3
    - 0
  - value: 0.5
dtype: bfloat16

anakin87
/

Llama-3-8b-ita-slerp

Llama-3-8b-ita-slerp

Evaluation

Merge Details

Merge Method

Models Merged

Configuration

Model tree for anakin87/Llama-3-8b-ita-slerp

Spaces using anakin87/Llama-3-8b-ita-slerp 5

Collection including anakin87/Llama-3-8b-ita-slerp

🇮🇹 Italian Merges