Phi3-mini-128k-it ORPO model
Phi-3-mini-128k-instruct fine-tuned on Text-to-SQL downstream task using Odds Ratio Preference Optimization (ORPO).
Details
A 4-bit quantized version of the Phi-3-mini-128k-instruct model was used to fine-tuned on zerolink/zsql-sqlite-dpo. Used PEFT to merge the trained adapters.
Odds Ratio Preference Optimization (ORPO)
The goal of ORPO is to penalize the "rejected" samples, and increase the likelihood of "accepted" samples.This builds upon DPO but incorporates a ranking of preferences. This means we not only learn which outputs are preferred but also their relative ranking
Dataset
The model was fine-tuned on zerolink/zsql-sqlite-dpo dataset. Total entries in the dataset: 250,000
The dataset needs to be in the following format: You need at least 3 columns:
- Schema
- Question
- Rejected
- Chosen
- Weight
For example:
- Schema: "CREATE TABLE table_name_56 (location TEXT, year INTEGER)"
- Question: "What location is previous to 1994?"
- Rejected: "SELECT location FROM table_name_56 WHERE year < 1994"
- Chosen: "SELECT "location" FROM "table_name_56" WHERE "year" < 1994"
- Weight: 0.056641
Training Parameters
QLoRA Parameters
- r = 16
- target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj",]
- lora_alpha = 16
- lora_dropout = 0
- bias = None
- random_state = 3407
ORPO Trainer Config
- num_epochs = 1
- max_steps = 30
- per_device_train_batch_size = 2
- gradient_accumulation_step = 4
- optim = "adamw_8it"
- lr_scheduler_type = "linear,"