Safetensors
mistral
entity linking
compressed-tensors
Edit model card

Introduction

This model is quantized version of linear merge of mistralai/Mistral-7B-v0.1 and daisd-ai/anydef-orpo-v2.

Merging

Models were merged to improve quality of the final model (idea) and prevent huge losses during quantization. Merging was done using mergekit with following spec:

models:
  - model: mistralai/Mistral-7B-v0.1
    parameters:
      weight: 0.3
  - model: daisd-ai/anydef-orpo-v2
    parameters:
      weight: 0.7
merge_method: linear
dtype: bfloat16

Quantization

The quantization was applied using LLM Compressor with 512 random examples from anydef-kilt-tasks-v2 dataset. We tested other numbers of examples, but did not see noticeable improvement with higher number of examples during quantization.

The recipe for quantization:

recipe = [
    SmoothQuantModifier(smoothing_strength=0.8),
    GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"]),
]

Inference

For inference code you can check our github.

Benchmarks results

Precision (%):

Dataset anydef-v2 anydef-v2-quant (this)
RSS-500 66.89 64.90
ISTEX-1000 85.82 84.33
Reuters-128 64.88 68.28
TweekiGold 75.93 75.93

Retrieval rate (%):

Dataset anydef-v2 anydef-v2-quant (this)
RSS-500 84.11 83.44
ISTEX-1000 97.76 97.31
Reuters-128 83.33 83.87
TweekiGold 91.67 91.44
Downloads last month
62
Safetensors
Model size
1.19B params
Tensor type
I64
·
I32
·
BF16
·
Inference API
Inference API (serverless) has been turned off for this model.

Model tree for daisd-ai/anydef-v2-linear-W4A16

Quantized
(1)
this model

Dataset used to train daisd-ai/anydef-v2-linear-W4A16