lightonai
/

MonoQwen2-VL-v0.1

Model card Files Files and versions Community

MonoQwen2-VL-v0.1 / README.md

uminaty's picture

Update README.md

151aef1 verified 24 days ago

|

3.47 kB

	---
	license: apache-2.0
	---
	# MonoQwen2-VL-2B-LoRA-Reranker

	## Model Overview
	The MonoQwen2-VL-2B-LoRA-Reranker is a LoRA fine-tuned version of the Qwen2-VL-2B model, optimized for reranking image-query relevance.

	## How to Use the Model
	Below is a quick example to rerank a single image against a user query using this model:

	```python
	import torch
	from PIL import Image
	from transformers import AutoProcessor, Qwen2VLForConditionalGeneration

	# Load processor and model
	processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
	model = Qwen2VLForConditionalGeneration.from_pretrained("lightonai/MonoQwen2-VL-2B-LoRA-Reranker")

	# Define the query and the image
	query = "What is the value of the thing in the document"
	image = Image.open("path_to_image.jpg")

	# Prepare the inputs
	prompt = f"Assert the relevance of the previous image document to the following query, answer True or False. The query is: {query}"
	inputs = processor(text=prompt, images=image, return_tensors="pt")

	# Run the model and obtain results
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits
	logits_for_last_token = logits[:, -1, :]
	true_token_id = processor.tokenizer.convert_tokens_to_ids("True")
	false_token_id = processor.tokenizer.convert_tokens_to_ids("False")
	relevance_score = torch.softmax(logits_for_last_token[:, [true_token_id, false_token_id]], dim=-1)

	# Print the True/False probabilities
	true_prob = relevance_score[:, 0].item()
	false_prob = relevance_score[:, 1].item()

	print(f"True probability: {true_prob}, False probability: {false_prob}")
	```

	This example demonstrates how to use the model to assess the relevance of an image with respect to a query. It outputs the probability that the image is relevant ("True") or not relevant ("False").

	## Performance Metrics

	The model has been evaluated on [ViDoRe Benchmark](https://huggingface.co/spaces/vidore/vidore-leaderboard), by retrieving 10 elements with [MrLight_dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLight/dse-qwen2-2b-mrl-v1) and reranking them. The table below summarizes its `ndcg@5` scores:

	\| Dataset \| NDCG@5 Before Reranking \| NDCG@5 After Reranking \|
	\|---------------------------------------------------\|--------------------------\|------------------------\|
	\| Mean \| 87.6 \| 91.8 \|
	\| vidore/arxivqa_test_subsampled \| 85.6 \| 89.01 \|
	\| vidore/docvqa_test_subsampled \| 57.1 \| 59.71 \|
	\| vidore/infovqa_test_subsampled \| 88.1 \| 93.49 \|
	\| vidore/tabfquad_test_subsampled \| 93.1 \| 95.96 \|
	\| vidore/shiftproject_test \| 82.0 \| 92.98 \|
	\| vidore/syntheticDocQA_artificial_intelligence_test\| 97.5 \| 100.00 \|
	\| vidore/syntheticDocQA_energy_test \| 92.9 \| 97.65 \|
	\| vidore/syntheticDocQA_government_reports_test \| 96.0 \| 98.04 \|
	\| vidore/syntheticDocQA_healthcare_industry_test \| 96.4 \| 99.27 \|




	## License

	This LoRA model is licensed under the Apache 2.0 license.