Safetensors
qwen2_vl
vidore
reranker
File size: 4,221 Bytes
de61fd2
 
70d3e6f
63330a9
 
9c21e85
de61fd2
 
 
 
151aef1
de61fd2
78b0fe8
 
de61fd2
 
 
 
 
 
 
 
 
 
d902199
 
6c562c8
c8db55c
 
d902199
de61fd2
d902199
 
 
 
de61fd2
d902199
 
 
 
 
de61fd2
d902199
 
 
 
 
 
 
 
 
 
 
 
6c562c8
d902199
 
de61fd2
 
d902199
 
 
 
 
 
de61fd2
d902199
 
 
de61fd2
d902199
de61fd2
 
 
 
 
 
 
 
 
 
874608a
de61fd2
 
 
 
 
 
 
 
 
874608a
de61fd2
 
 
 
70d3e6f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: apache-2.0
tags:
- vidore
- reranker
- qwen2_vl
---
# MonoQwen2-VL-2B-LoRA-Reranker

## Model Overview
The **MonoQwen2-VL-2B-LoRA-Reranker** is a LoRA fine-tuned version of the Qwen2-VL-2B model, optimized for reranking image-query relevance.

It was train using [ColPali train set](https://huggingface.co/datasets/vidore/colpali_train_set)

## How to Use the Model
Below is a quick example to rerank a single image against a user query using this model:

```python
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration

# Load processor and model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "lightonai/MonoQwen2-VL-2B-LoRA-Reranker",
    device_map="auto",
    # attn_implementation="flash_attention_2",
    # torch_dtype=torch.bfloat16,
)

# Define query and load image
query = "Is this your query about a document ?"
image_path = "your/path/to/image.png"
image = Image.open(image_path)

# Construct the prompt and prepare input
prompt = (
    "Assert the relevance of the previous image document to the following query, "
    "answer True or False. The query is: {query}"
).format(query=query)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": prompt},
        ],
    }
]

# Apply chat template and tokenize
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to("cuda")

# Run inference to obtain logits
with torch.no_grad():
    outputs = model(**inputs)
    logits_for_last_token = outputs.logits[:, -1, :]

# Convert tokens and calculate relevance score
true_token_id = processor.tokenizer.convert_tokens_to_ids("True")
false_token_id = processor.tokenizer.convert_tokens_to_ids("False")
relevance_score = torch.softmax(logits_for_last_token[:, [true_token_id, false_token_id]], dim=-1)

# Extract and display probabilities
true_prob = relevance_score[0, 0].item()
false_prob = relevance_score[0, 1].item()

print(f"True probability: {true_prob:.4f}, False probability: {false_prob:.4f}")
```

This example demonstrates how to use the model to assess the relevance of an image with respect to a query. It outputs the probability that the image is relevant ("True") or not relevant ("False").

## Performance Metrics

The model has been evaluated on [ViDoRe Benchmark](https://huggingface.co/spaces/vidore/vidore-leaderboard), by retrieving 10 elements with [MrLight_dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLight/dse-qwen2-2b-mrl-v1) and reranking them. The table below summarizes its `ndcg@5` scores:

| Dataset                                           | NDCG@5 Before Reranking  | NDCG@5 After Reranking |
|---------------------------------------------------|--------------------------|------------------------|
| **Mean**                                          | 85.8                     | **90.5**               |
| vidore/arxivqa_test_subsampled                    | 85.6                     | 89.01                  |
| vidore/docvqa_test_subsampled                     | 57.1                     | 59.71                  |
| vidore/infovqa_test_subsampled                    | 88.1                     | 93.49                  |
| vidore/tabfquad_test_subsampled                   | 93.1                     | 95.96                  |
| vidore/shiftproject_test                          | 82.0                     | 92.98                  |
| vidore/syntheticDocQA_artificial_intelligence_test| 97.5                     | 100.00                 |
| vidore/syntheticDocQA_energy_test                 | 92.9                     | 97.65                  |
| vidore/syntheticDocQA_government_reports_test     | 96.0                     | 98.04                  |
| vidore/syntheticDocQA_healthcare_industry_test    | 96.4                     | 99.27                  |
| vidore/tatdqa_test                                | 69.4                     | 78.98                  |


## License

This LoRA model is licensed under the Apache 2.0 license.