Week2LLMFineTune - ORPO-Trained GPT-2
This model is a fine-tuned version of openai-community/gpt2 using ORPO (Odds Ratio Preference Optimization) training on the ORPO-DPO-Mix-40k dataset.
Model Details
- Base Model: GPT-2
- Training Method: ORPO (Odds Ratio Preference Optimization)
- Dataset Size: 40k examples
- Context Length: 512 tokens
- Training Hardware:
- 2x 3090 RTX GPU Setup
- RAM 128GB
- CPU AMD Ryzen 9 5900X 12-Core Processor
Training Parameters
Training Arguments:
- Learning Rate: 2e-5
- Batch Size: 4
- Epochs: 1
- Block Size: 512
- Warmup Ratio: 0.1
- Weight Decay: 0.01
- Gradient Accumulation: 4
- Mixed Precision: bf16
LoRA Configuration:
- R: 16
- Alpha: 32
- Dropout: 0.05
Intended Use
This model is designed for:
- General text generation tasks
- Conversational AI applications
- Text completion with preference alignment
Training Approach
he model was trained using ORPO, which combines:
- Supervised Fine-Tuning (SFT)
- Preference Optimization
- Efficient LoRA adaptation
Evaluation Metrics
The model has been evaluated on various benchmarks but results are pending publication.
- Limitations
- Limited by base model architecture (GPT-2)
- Training dataset size constraints
- Context length limited to 512 tokens
- Inherits base model biases
Evaluation Results
The model has been evaluated on multiple benchmarks with the following results:
HellaSwag
Metric | Value | Stderr |
---|---|---|
acc | 0.2906 | ±0.0045 |
acc_norm | 0.3126 | ±0.0046 |
TinyMMLU
Metric | Value | Stderr |
---|---|---|
acc_norm | 0.3152 | N/A |
ARC Easy
Metric | Value | Stderr |
---|---|---|
acc | 0.4116 | ±0.0101 |
acc_norm | 0.3910 | ±0.0100 |
All evaluations were performed with the following settings:
- Number of few-shot examples: 0 (zero-shot)
- Device: CUDA
- Batch size: 1
- Model type: GPT-2 with ORPO fine-tuning
These results demonstrate the model's capabilities across different tasks:
- Common sense reasoning (HellaSwag)
- Multi-task knowledge (TinyMMLU)
- Grade-school level reasoning (ARC Easy)
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Decepticore/Week2LLMFineTune
Base model
openai-community/gpt2