Phi3-mini-128k-it ORPO model

Phi-3-mini-128k-instruct fine-tuned on Text-to-SQL downstream task using Odds Ratio Preference Optimization (ORPO).

Details

A 4-bit quantized version of the Phi-3-mini-128k-instruct model was used to fine-tuned on zerolink/zsql-sqlite-dpo. Used PEFT to merge the trained adapters.

Odds Ratio Preference Optimization (ORPO)

The goal of ORPO is to penalize the "rejected" samples, and increase the likelihood of "accepted" samples.This builds upon DPO but incorporates a ranking of preferences. This means we not only learn which outputs are preferred but also their relative ranking

Dataset

The model was fine-tuned on zerolink/zsql-sqlite-dpo dataset. Total entries in the dataset: 250,000

The dataset needs to be in the following format: You need at least 3 columns:

Schema
Question
Rejected
Chosen
Weight

For example:

Schema: "CREATE TABLE table_name_56 (location TEXT, year INTEGER)"
Question: "What location is previous to 1994?"
Rejected: "SELECT location FROM table_name_56 WHERE year < 1994"
Chosen: "SELECT "location" FROM "table_name_56" WHERE "year" < 1994"
Weight: 0.056641

Training Parameters

QLoRA Parameters
- r = 16
- target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj",]
- lora_alpha = 16
- lora_dropout = 0
- bias = None
- random_state = 3407
ORPO Trainer Config
- num_epochs = 1
- max_steps = 30
- per_device_train_batch_size = 2
- gradient_accumulation_step = 4
- optim = "adamw_8it"
- lr_scheduler_type = "linear,"