File size: 4,087 Bytes
de61fd2 151aef1 de61fd2 d902199 6c562c8 c8db55c d902199 de61fd2 d902199 de61fd2 d902199 de61fd2 d902199 6c562c8 d902199 de61fd2 d902199 de61fd2 d902199 de61fd2 d902199 de61fd2 874608a de61fd2 874608a de61fd2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
---
license: apache-2.0
---
# MonoQwen2-VL-2B-LoRA-Reranker
## Model Overview
The **MonoQwen2-VL-2B-LoRA-Reranker** is a LoRA fine-tuned version of the Qwen2-VL-2B model, optimized for reranking image-query relevance.
## How to Use the Model
Below is a quick example to rerank a single image against a user query using this model:
```python
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
# Load processor and model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = Qwen2VLForConditionalGeneration.from_pretrained(
"lightonai/MonoQwen2-VL-2B-LoRA-Reranker",
device_map="auto",
# attn_implementation="flash_attention_2",
# torch_dtype=torch.bfloat16,
)
# Define query and load image
query = "Is this your query about a document ?"
image_path = "your/path/to/image.png"
image = Image.open(image_path)
# Construct the prompt and prepare input
prompt = (
"Assert the relevance of the previous image document to the following query, "
"answer True or False. The query is: {query}"
).format(query=query)
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": prompt},
],
}
]
# Apply chat template and tokenize
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to("cuda")
# Run inference to obtain logits
with torch.no_grad():
outputs = model(**inputs)
logits_for_last_token = outputs.logits[:, -1, :]
# Convert tokens and calculate relevance score
true_token_id = processor.tokenizer.convert_tokens_to_ids("True")
false_token_id = processor.tokenizer.convert_tokens_to_ids("False")
relevance_score = torch.softmax(logits_for_last_token[:, [true_token_id, false_token_id]], dim=-1)
# Extract and display probabilities
true_prob = relevance_score[0, 0].item()
false_prob = relevance_score[0, 1].item()
print(f"True probability: {true_prob:.4f}, False probability: {false_prob:.4f}")
```
This example demonstrates how to use the model to assess the relevance of an image with respect to a query. It outputs the probability that the image is relevant ("True") or not relevant ("False").
## Performance Metrics
The model has been evaluated on [ViDoRe Benchmark](https://huggingface.co/spaces/vidore/vidore-leaderboard), by retrieving 10 elements with [MrLight_dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLight/dse-qwen2-2b-mrl-v1) and reranking them. The table below summarizes its `ndcg@5` scores:
| Dataset | NDCG@5 Before Reranking | NDCG@5 After Reranking |
|---------------------------------------------------|--------------------------|------------------------|
| **Mean** | 85.8 | **90.5** |
| vidore/arxivqa_test_subsampled | 85.6 | 89.01 |
| vidore/docvqa_test_subsampled | 57.1 | 59.71 |
| vidore/infovqa_test_subsampled | 88.1 | 93.49 |
| vidore/tabfquad_test_subsampled | 93.1 | 95.96 |
| vidore/shiftproject_test | 82.0 | 92.98 |
| vidore/syntheticDocQA_artificial_intelligence_test| 97.5 | 100.00 |
| vidore/syntheticDocQA_energy_test | 92.9 | 97.65 |
| vidore/syntheticDocQA_government_reports_test | 96.0 | 98.04 |
| vidore/syntheticDocQA_healthcare_industry_test | 96.4 | 99.27 |
| vidore/tatdqa_test | 69.4 | 78.98 |
## License
This LoRA model is licensed under the Apache 2.0 license.
|