Safetensors
qwen2_vl
vidore
reranker
MonoQwen2-VL-v0.1 / README.md
uminaty's picture
Update README.md
151aef1 verified
|
raw
history blame
3.47 kB
metadata
license: apache-2.0

MonoQwen2-VL-2B-LoRA-Reranker

Model Overview

The MonoQwen2-VL-2B-LoRA-Reranker is a LoRA fine-tuned version of the Qwen2-VL-2B model, optimized for reranking image-query relevance.

How to Use the Model

Below is a quick example to rerank a single image against a user query using this model:

import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration

# Load processor and model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = Qwen2VLForConditionalGeneration.from_pretrained("lightonai/MonoQwen2-VL-2B-LoRA-Reranker")

# Define the query and the image
query = "What is the value of the thing in the document"
image = Image.open("path_to_image.jpg")

# Prepare the inputs
prompt = f"Assert the relevance of the previous image document to the following query, answer True or False. The query is: {query}"
inputs = processor(text=prompt, images=image, return_tensors="pt")

# Run the model and obtain results
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    logits_for_last_token = logits[:, -1, :]
    true_token_id = processor.tokenizer.convert_tokens_to_ids("True")
    false_token_id = processor.tokenizer.convert_tokens_to_ids("False")
    relevance_score = torch.softmax(logits_for_last_token[:, [true_token_id, false_token_id]], dim=-1)

# Print the True/False probabilities
true_prob = relevance_score[:, 0].item()
false_prob = relevance_score[:, 1].item()

print(f"True probability: {true_prob}, False probability: {false_prob}")

This example demonstrates how to use the model to assess the relevance of an image with respect to a query. It outputs the probability that the image is relevant ("True") or not relevant ("False").

Performance Metrics

The model has been evaluated on ViDoRe Benchmark, by retrieving 10 elements with MrLight_dse-qwen2-2b-mrl-v1 and reranking them. The table below summarizes its ndcg@5 scores:

Dataset NDCG@5 Before Reranking NDCG@5 After Reranking
Mean 87.6 91.8
vidore/arxivqa_test_subsampled 85.6 89.01
vidore/docvqa_test_subsampled 57.1 59.71
vidore/infovqa_test_subsampled 88.1 93.49
vidore/tabfquad_test_subsampled 93.1 95.96
vidore/shiftproject_test 82.0 92.98
vidore/syntheticDocQA_artificial_intelligence_test 97.5 100.00
vidore/syntheticDocQA_energy_test 92.9 97.65
vidore/syntheticDocQA_government_reports_test 96.0 98.04
vidore/syntheticDocQA_healthcare_industry_test 96.4 99.27

License

This LoRA model is licensed under the Apache 2.0 license.