license: apache-2.0
tags:
- UNA
- simple-math
- juanako
datasets:
- fblgit/simple-math
- jondurbin/bagel-v0.3
base_model: abacusai/Smaug-34B-v0.1
model-index:
- name: UNA-SimpleSmaug-34b-v1beta
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 74.57
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 86.74
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 76.68
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 70.17
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 83.82
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 72.48
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
name: Open LLM Leaderboard
UNA-SimpleSmaug-34b-v1beta
Scoring 04-February-2024 #1 34B model, outperforming its original base model Smaug-34B-v0.1 with 77.41
😎
Oh, btw.. this one went thru SFT so the abacus inside Smaug is back to normal.. so you can further train/dpo him .. RESET!
Applied UNA only on the Attention, not on the MLP's
- Is based on Smaug
- SimpleMath dataset
- It was trained on Axolotl
Experiment
The thing here is to understand whats the impact of SimpleMath applied at the attention layer during a SFT session and how it impacts on the neural network overall.
Results: Improving mathematican and reasoning capabilities without degrading and presserving previous training sessions.
Evals
Pending, but so far this one
| Task |Version| Metric |Value |
|-------------|------:|--------|----------------:|
|arc_challenge| HF|acc_norm| 0.7457337883959 |
|gsm8k | HF|acc | 0.7247915087187 |
|mmlu | HF|acc | 0.7649553475572 |
|mmlu | HF|acc_norm| 0.7681713551647 |
|hellaswag | HF|acc_norm| 0.8673571001792 |
|truthfulqa | HF|mc2 | 0.7016557407771 |
|winogrande | HF|acc | 0.8382004735595 |
|------------------------------------------------|
Increasing GSM, MMLU, ARC, WINO.
Citations
To abacusai for making Smaug-34B, the Bagel, and all the magic behind the base model.
If you use the model, provide citation even for merges or anything. And enjoy our ModelSimilarities tool detector https://github.com/fblgit/model-similarity
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 77.41 |
AI2 Reasoning Challenge (25-Shot) | 74.57 |
HellaSwag (10-Shot) | 86.74 |
MMLU (5-Shot) | 76.68 |
TruthfulQA (0-shot) | 70.17 |
Winogrande (5-shot) | 83.82 |
GSM8k (5-shot) | 72.48 |