Yi-VL-6B-hf / README.md
BUAADreamer's picture
Update README.md
c1fe742 verified
|
raw
history blame
1.88 kB
---
library_name: transformers
tags:
- llama-factory
- yi-vl
- llava
license: other
language:
- zh
- en
pipeline_tag: visual-question-answering
---
This is the Huggingface version of [Yi-VL-6B](https://huggingface.co/01-ai/Yi-VL-6B) model.
You may use this model for fine-tuning in downstream tasks, we recommend using our efficient fine-tuning toolkit. https://github.com/hiyouga/LLaMA-Factory
- **Developed by:** 01-AI.
- **Language(s) (NLP):** Chinese/English
- **License:** [Yi Series Model License](https://huggingface.co/01-ai/Yi-VL-34B/blob/main/LICENSE)
Usage:
```python
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
model_id = "BUAADreamer/Yi-VL-6B-hf"
messages = [
{ "role": "user", "content": "What's in the picture?" }
]
model = AutoModelForVision2Seq.from_pretrained(
model_id,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
).to(0)
processor = AutoProcessor.from_pretrained(model_id)
text = [processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)]
images = [Image.open(requests.get(image_file, stream=True).raw)]
inputs = processor(text=prompt, images=images, return_tensors='pt').to(0, torch.float16)
output = model.generate(**inputs, max_new_tokens=200)
output = processor.batch_decode(output, skip_special_tokens=True)
print(output.split("Assistant:")[-1].strip())
```
You could also alternatively launch a Web demo by using the CLI command in [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)
```bash
llamafactory-cli webchat \
--model_name_or_path BUAADreamer/Yi-VL-6B-hf \
--template yivl \
--visual_inputs
```
# [lmms-eval Evaluation Results](https://github.com/EvolvingLMMs-Lab/lmms-eval)
| Metric |Value|
|---------------------------------|----:|
| MMMU_val |36.8|
|CMMMU_val |32.2|