Edit model card

ORPO-Tuned Llama2-1B-Instruct

NB: Done purely as a fine-tuning exercise. Not intedned for any practical use.

This model is a fine-tuned version of Meta's Llama-3.2-1B-Instruct using ORPO (Optimizing Reward with Policy Optimization). The model was trained to better align with human preferences using a curated preference dataset from mlabonne/orpo-dpo-mix-40k.

Model Details

  • Base Model: meta-llama/Llama-3.2-1B-Instruct
  • Training Method: ORPO (Optimizing Reward with Policy Optimization) with LoRA
  • Training Dataset: mlabonne/orpo-dpo-mix-40k (subset of 100 examples)
  • Framework: Hugging Face Transformers, TRL, PEFT
  • Training Date: November 2024
  • License: Same as base model (Llama 2)

Training Process

The model was fine-tuned using LoRA (Low-Rank Adaptation) with the following configuration:

LoRA Parameters

  • r=16 (rank)
  • lora_alpha=32
  • lora_dropout=0.05
  • bias="none"
  • task_type="CAUSAL_LM"

Training Parameters

  • Learning rate: 1e-5
  • Batch size: 4
  • Gradient accumulation steps: 4
  • Maximum steps: 100
  • Warmup steps: 10
  • Gradient checkpointing: Enabled
  • FP16 training: Enabled
  • Maximum sequence length: 512
  • Maximum prompt length: 512
  • Optimizer: AdamW

Evaluation Results

The model was evaluated on the HellaSwag benchmark with the following configuration:

  • Batch size: 64 (auto-detected)
  • Full evaluation set
  • Zero-shot setting
  • FP16 precision

Results:

Metric Value Standard Error
Accuracy 45.20% ±0.50%
Normalized Accuracy 60.78% ±0.49%
Downloads last month
9
Safetensors
Model size
1.24B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for illeto/finetunning-week2

Finetuned
(108)
this model

Dataset used to train illeto/finetunning-week2