File size: 4,221 Bytes
de61fd2 70d3e6f 63330a9 9c21e85 de61fd2 151aef1 de61fd2 78b0fe8 de61fd2 d902199 6c562c8 c8db55c d902199 de61fd2 d902199 de61fd2 d902199 de61fd2 d902199 6c562c8 d902199 de61fd2 d902199 de61fd2 d902199 de61fd2 d902199 de61fd2 874608a de61fd2 874608a de61fd2 70d3e6f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
license: apache-2.0
tags:
- vidore
- reranker
- qwen2_vl
---
# MonoQwen2-VL-2B-LoRA-Reranker
## Model Overview
The **MonoQwen2-VL-2B-LoRA-Reranker** is a LoRA fine-tuned version of the Qwen2-VL-2B model, optimized for reranking image-query relevance.
It was train using [ColPali train set](https://huggingface.co/datasets/vidore/colpali_train_set)
## How to Use the Model
Below is a quick example to rerank a single image against a user query using this model:
```python
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
# Load processor and model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = Qwen2VLForConditionalGeneration.from_pretrained(
"lightonai/MonoQwen2-VL-2B-LoRA-Reranker",
device_map="auto",
# attn_implementation="flash_attention_2",
# torch_dtype=torch.bfloat16,
)
# Define query and load image
query = "Is this your query about a document ?"
image_path = "your/path/to/image.png"
image = Image.open(image_path)
# Construct the prompt and prepare input
prompt = (
"Assert the relevance of the previous image document to the following query, "
"answer True or False. The query is: {query}"
).format(query=query)
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": prompt},
],
}
]
# Apply chat template and tokenize
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to("cuda")
# Run inference to obtain logits
with torch.no_grad():
outputs = model(**inputs)
logits_for_last_token = outputs.logits[:, -1, :]
# Convert tokens and calculate relevance score
true_token_id = processor.tokenizer.convert_tokens_to_ids("True")
false_token_id = processor.tokenizer.convert_tokens_to_ids("False")
relevance_score = torch.softmax(logits_for_last_token[:, [true_token_id, false_token_id]], dim=-1)
# Extract and display probabilities
true_prob = relevance_score[0, 0].item()
false_prob = relevance_score[0, 1].item()
print(f"True probability: {true_prob:.4f}, False probability: {false_prob:.4f}")
```
This example demonstrates how to use the model to assess the relevance of an image with respect to a query. It outputs the probability that the image is relevant ("True") or not relevant ("False").
## Performance Metrics
The model has been evaluated on [ViDoRe Benchmark](https://huggingface.co/spaces/vidore/vidore-leaderboard), by retrieving 10 elements with [MrLight_dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLight/dse-qwen2-2b-mrl-v1) and reranking them. The table below summarizes its `ndcg@5` scores:
| Dataset | NDCG@5 Before Reranking | NDCG@5 After Reranking |
|---------------------------------------------------|--------------------------|------------------------|
| **Mean** | 85.8 | **90.5** |
| vidore/arxivqa_test_subsampled | 85.6 | 89.01 |
| vidore/docvqa_test_subsampled | 57.1 | 59.71 |
| vidore/infovqa_test_subsampled | 88.1 | 93.49 |
| vidore/tabfquad_test_subsampled | 93.1 | 95.96 |
| vidore/shiftproject_test | 82.0 | 92.98 |
| vidore/syntheticDocQA_artificial_intelligence_test| 97.5 | 100.00 |
| vidore/syntheticDocQA_energy_test | 92.9 | 97.65 |
| vidore/syntheticDocQA_government_reports_test | 96.0 | 98.04 |
| vidore/syntheticDocQA_healthcare_industry_test | 96.4 | 99.27 |
| vidore/tatdqa_test | 69.4 | 78.98 |
## License
This LoRA model is licensed under the Apache 2.0 license. |