alnrg2arg/test3_sft_16bit_dpo2

This is a model from blockchainlab test 2.4 - alnrg2arg/blockchainlabs_7B_merged_test2_4.

The project is running to make a small LLM for a on-device purpose.

Overall pipeline for this iteration is

1.Merging to make a base model (7B) 2.Prune the model to reduce the parameter (50% sparcity) 3.For recovery phase of the pruning, the DPO is chosen.

This model which is not pruned is intended to compare with the pruned model.

This is the code and parameters I chose for this model(DPO).

from transformers import TrainingArguments, AutoModelForCausalLM
from trl import DPOTrainer

dpo_trainer = DPOTrainer(
    model = model,
   
    ref_model = None,
    args = TrainingArguments(
        per_device_train_batch_size = 8,
        gradient_accumulation_steps = 8,
        warmup_ratio = 0.1,
        num_train_epochs = 3,
        learning_rate = 5e-6,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.0,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "output_DPO",
    ),
    beta = 0.1,
    train_dataset = dataset,
    # eval_dataset = raw_datasets["test"],
    tokenizer = tokenizer,
    max_length = 1024,
    max_prompt_length = 512,
)

The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing

Benchmark Scores

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
arc_challenge	1	none	0	acc	0.6894	±	0.0135
		none	0	acc_norm	0.6860	±	0.0136

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
hellaswag	1	none	0	acc	0.7092	±	0.0045
		none	0	acc_norm	0.8736	±	0.0033

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
truthfulqa_mc2	2	none	0	acc	0.7126	±	0.015

Groups	Version	Filter	Metric	Value		Stderr
mmlu	N/A	none	acc	0.6225	±	0.1292
- humanities	N/A	none	acc	0.5745	±	0.1286
- other	N/A	none	acc	0.6952	±	0.1095
- social_sciences	N/A	none	acc	0.7280	±	0.0735
- stem	N/A	none	acc	0.5195	±	0.1313

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
winogrande	1	none	0	acc	0.824	±	0.0107

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
gsm8k	2	get-answer	5	exact_match	0.7263	±	0.0123

Average = 74.08

alnrg2arg
/

test3_sft_16bit_dpo2

Model tree for alnrg2arg/test3_sft_16bit_dpo2

Dataset used to train alnrg2arg/test3_sft_16bit_dpo2