Edit model card

Model Card for Model ID

Model Details

Model Description

This model is DPO by argilla/dpo-mix-7k dataset on rungao2001/vicuna-7b-v1.5_deita10k_sft_full model.

  • Model type: Llama2 Decoder-Only
  • Language(s) (NLP): English
  • License: llama2
  • Finetuned from model: rungao2001/vicuna-7b-v1.5_deita10k_sft_full

Training Details

Training Data

argilla/dpo-mix-7k

Training Procedure

DPO

Notice: The chat_template was modified because the original vicuna1.1 format cannot be used in trl.DPOTrainer. The error "Conversation roles must alternate user/assistant/user/assistant/..." was removed, and the system message is output only when loop.index0 == 0 and role == 'user'.

Training Hyperparameters

  • Precision: BFloat16
  • Chat Template: Modified Vicuna 1.1
  • Global Batch Size: 128
  • Learning Rate: 1.0e-6
  • Num Epoches: 3
  • Max Prompt Length: 1800
  • Max Length: 2048
  • Training Steps 156

Evaluation

It Finally achieved loss=0.5006, and rewards/accuracies = 78.72% in the eval set of argilla/dpo-mix-7k

Testing Data, Factors & Metrics

Downloads last month
2
Safetensors
Model size
6.74B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Dataset used to train rungao2001/vicuna-7b-v1.5-dpo-mix-7k-full