Decepticore
/

Week2LLMFineTune

Text Generation

Trained with AutoTrain

text-generation-inference

preference-optimization

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Week2LLMFineTune - ORPO-Trained GPT-2

This model is a fine-tuned version of openai-community/gpt2 using ORPO (Odds Ratio Preference Optimization) training on the ORPO-DPO-Mix-40k dataset.

Model Details

Base Model: GPT-2
Training Method: ORPO (Odds Ratio Preference Optimization)
Dataset Size: 40k examples
Context Length: 512 tokens
Training Hardware:
- 2x 3090 RTX GPU Setup
- RAM 128GB
- CPU AMD Ryzen 9 5900X 12-Core Processor

Training Parameters

Training Arguments:

Learning Rate: 2e-5
Batch Size: 4
Epochs: 1
Block Size: 512
Warmup Ratio: 0.1
Weight Decay: 0.01
Gradient Accumulation: 4
Mixed Precision: bf16

LoRA Configuration:

R: 16
Alpha: 32
Dropout: 0.05

Intended Use

This model is designed for:

General text generation tasks
Conversational AI applications
Text completion with preference alignment

Training Approach

he model was trained using ORPO, which combines:

Supervised Fine-Tuning (SFT)
Preference Optimization
Efficient LoRA adaptation

Evaluation Metrics

The model has been evaluated on various benchmarks but results are pending publication.

Limitations
- Limited by base model architecture (GPT-2)
- Training dataset size constraints
- Context length limited to 512 tokens
- Inherits base model biases

Evaluation Results

The model has been evaluated on multiple benchmarks with the following results:

HellaSwag

Metric	Value	Stderr
acc	0.2906	±0.0045
acc_norm	0.3126	±0.0046

TinyMMLU

Metric	Value	Stderr
acc_norm	0.3152	N/A

ARC Easy

Metric	Value	Stderr
acc	0.4116	±0.0101
acc_norm	0.3910	±0.0100

All evaluations were performed with the following settings:

Number of few-shot examples: 0 (zero-shot)
Device: CUDA
Batch size: 1
Model type: GPT-2 with ORPO fine-tuning

These results demonstrate the model's capabilities across different tasks:

Common sense reasoning (HellaSwag)
Multi-task knowledge (TinyMMLU)
Grade-school level reasoning (ARC Easy)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Decepticore/Week2LLMFineTune

Base model

openai-community/gpt2

Finetuned

(1138)

this model

Dataset used to train Decepticore/Week2LLMFineTune

Evaluation results

Loss on ORPO-DPO-Mix-40k
self-reported

1.234

View on Papers With Code