tincans-ai/gazelle-v0.2-dpo

Gazelle v0.2 is the mid-March release from Tincans of a joint speech-language model.

This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.

The datasets used were snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset (first iteration) and jondurbin/truthy-dpo-v0.1. We trained for 2 epochs with max_lr=3e-4, batch size 32, 10 warmup steps, cosine decay.

We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.

Please see this notebook for an inference example.

tincans-ai
/

gazelle-v0.2-dpo

Datasets used to train tincans-ai/gazelle-v0.2-dpo

Collection including tincans-ai/gazelle-v0.2-dpo

gazelle v0.2