Text Generation
Transformers
PyTorch
TensorBoard
English
olmo
conversational
Inference Endpoints
Edit model card

OLMo-1B-0724 Instruct

This is a version of OLMo-1B-0724-hf that has undergone SFT and DPO training. See the SFT model card for details on SFT training.

This model is initialised from OLMo-1B-0724-SFT-hf, and then DPO trained on a cleaned ultrafeedback dataset for 3 epochs with a batch size of 32, beta of 0.1, linear warmup for 10% of training, and then linear cooldown.

Evals are as follows:

Metric OLMo-1B-0724-hf OLMo-1B-0724-SFT-hf OLMo-1B-0724-Instruct-hf (this model!)
MMLU 0-shot 25.0 36.0 36.7
GSM8k CoT 8-shot 7.0 12.5 12.5
BBH CoT 3-shot 22.5 27.2 30.6
HumanEval P@10 16.0 21.2 22.0
AlpacaEval 1 - 41.5 50.9
AlpacaEval 2 LC - 2.7 2.5
Toxigen % Toxic 80.3 59.7 14.1
TruthfulQA %Info+True 23.0 40.9 42.2
IFEval Loose Acc 20.5 26.1 24.2
XSTest F1 67.6 81.9 79.8
Average of above metrics 25.2 33.0 38.7

Model training and evaluation was performed using Open-instruct, so check that out for more details on evaluation.

Downloads last month
20
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Datasets used to train hamishivi/OLMo-1B-0724-Instruct-hf