Safetensors
qwen2_vl
vidore
reranker
File size: 4,087 Bytes
de61fd2
 
 
 
 
 
151aef1
de61fd2
 
 
 
 
 
 
 
 
 
 
d902199
 
6c562c8
c8db55c
 
d902199
de61fd2
d902199
 
 
 
de61fd2
d902199
 
 
 
 
de61fd2
d902199
 
 
 
 
 
 
 
 
 
 
 
6c562c8
d902199
 
de61fd2
 
d902199
 
 
 
 
 
de61fd2
d902199
 
 
de61fd2
d902199
de61fd2
 
 
 
 
 
 
 
 
 
874608a
de61fd2
 
 
 
 
 
 
 
 
874608a
de61fd2
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: apache-2.0
---
# MonoQwen2-VL-2B-LoRA-Reranker

## Model Overview
The **MonoQwen2-VL-2B-LoRA-Reranker** is a LoRA fine-tuned version of the Qwen2-VL-2B model, optimized for reranking image-query relevance.

## How to Use the Model
Below is a quick example to rerank a single image against a user query using this model:

```python
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration

# Load processor and model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "lightonai/MonoQwen2-VL-2B-LoRA-Reranker",
    device_map="auto",
    # attn_implementation="flash_attention_2",
    # torch_dtype=torch.bfloat16,
)

# Define query and load image
query = "Is this your query about a document ?"
image_path = "your/path/to/image.png"
image = Image.open(image_path)

# Construct the prompt and prepare input
prompt = (
    "Assert the relevance of the previous image document to the following query, "
    "answer True or False. The query is: {query}"
).format(query=query)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": prompt},
        ],
    }
]

# Apply chat template and tokenize
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to("cuda")

# Run inference to obtain logits
with torch.no_grad():
    outputs = model(**inputs)
    logits_for_last_token = outputs.logits[:, -1, :]

# Convert tokens and calculate relevance score
true_token_id = processor.tokenizer.convert_tokens_to_ids("True")
false_token_id = processor.tokenizer.convert_tokens_to_ids("False")
relevance_score = torch.softmax(logits_for_last_token[:, [true_token_id, false_token_id]], dim=-1)

# Extract and display probabilities
true_prob = relevance_score[0, 0].item()
false_prob = relevance_score[0, 1].item()

print(f"True probability: {true_prob:.4f}, False probability: {false_prob:.4f}")
```

This example demonstrates how to use the model to assess the relevance of an image with respect to a query. It outputs the probability that the image is relevant ("True") or not relevant ("False").

## Performance Metrics

The model has been evaluated on [ViDoRe Benchmark](https://huggingface.co/spaces/vidore/vidore-leaderboard), by retrieving 10 elements with [MrLight_dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLight/dse-qwen2-2b-mrl-v1) and reranking them. The table below summarizes its `ndcg@5` scores:

| Dataset                                           | NDCG@5 Before Reranking  | NDCG@5 After Reranking |
|---------------------------------------------------|--------------------------|------------------------|
| **Mean**                                          | 85.8                     | **90.5**               |
| vidore/arxivqa_test_subsampled                    | 85.6                     | 89.01                  |
| vidore/docvqa_test_subsampled                     | 57.1                     | 59.71                  |
| vidore/infovqa_test_subsampled                    | 88.1                     | 93.49                  |
| vidore/tabfquad_test_subsampled                   | 93.1                     | 95.96                  |
| vidore/shiftproject_test                          | 82.0                     | 92.98                  |
| vidore/syntheticDocQA_artificial_intelligence_test| 97.5                     | 100.00                 |
| vidore/syntheticDocQA_energy_test                 | 92.9                     | 97.65                  |
| vidore/syntheticDocQA_government_reports_test     | 96.0                     | 98.04                  |
| vidore/syntheticDocQA_healthcare_industry_test    | 96.4                     | 99.27                  |
| vidore/tatdqa_test                                | 69.4                     | 78.98                  |


## License

This LoRA model is licensed under the Apache 2.0 license.