daisd-ai/anydef-v2-linear-W4A16

Introduction

This model is quantized version of linear merge of mistralai/Mistral-7B-v0.1 and daisd-ai/anydef-orpo-v2.

Merging

Models were merged to improve quality of the final model (idea) and prevent huge losses during quantization. Merging was done using mergekit with following spec:

models:
  - model: mistralai/Mistral-7B-v0.1
    parameters:
      weight: 0.3
  - model: daisd-ai/anydef-orpo-v2
    parameters:
      weight: 0.7
merge_method: linear
dtype: bfloat16

Quantization

The quantization was applied using LLM Compressor with 512 random examples from anydef-kilt-tasks-v2 dataset. We tested other numbers of examples, but did not see noticeable improvement with higher number of examples during quantization.

The recipe for quantization:

recipe = [
    SmoothQuantModifier(smoothing_strength=0.8),
    GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"]),
]

Inference

For inference code you can check our github.

Benchmarks results

Precision (%):

Dataset	anydef-v2	anydef-v2-quant (this)
RSS-500	66.89	64.90
ISTEX-1000	85.82	84.33
Reuters-128	64.88	68.28
TweekiGold	75.93	75.93

Retrieval rate (%):

Dataset	anydef-v2	anydef-v2-quant (this)
RSS-500	84.11	83.44
ISTEX-1000	97.76	97.31
Reuters-128	83.33	83.87
TweekiGold	91.67	91.44

daisd-ai
/

anydef-v2-linear-W4A16

Introduction

Merging

Quantization

Inference

Benchmarks results

Model tree for daisd-ai/anydef-v2-linear-W4A16

Dataset used to train daisd-ai/anydef-v2-linear-W4A16

Evaluation results