Edit model card

Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.

Some initial benchmark results:

Task Version Metric Value Stderr
hellaswag 0 acc 0.6621 ± 0.0047
acc_norm 0.8525 ± 0.0035
arc_challenge 0 acc 0.6348 ± 0.0141
acc_norm 0.6698 ± 0.0137
winogrande 0 acc 0.7861 ± 0.0115
gsm8k 0 acc 0.5694 ± 0.0136
Downloads last month
704
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for chargoddard/loyal-piano-m7-cdpo

Merges
3 models

Dataset used to train chargoddard/loyal-piano-m7-cdpo

Spaces using chargoddard/loyal-piano-m7-cdpo 3

Collection including chargoddard/loyal-piano-m7-cdpo