File size: 1,882 Bytes
b461971
 
6ca20df
 
b28fc4b
 
 
 
 
 
 
b461971
 
2c6047b
b461971
2c6047b
b461971
2c6047b
 
 
b461971
2c6047b
b461971
2c6047b
 
 
b461971
2c6047b
 
b461971
2c6047b
b461971
2c6047b
 
 
b461971
2c6047b
 
 
 
 
 
b461971
2c6047b
 
 
b461971
2c6047b
 
 
 
b461971
2c6047b
b461971
2c6047b
 
 
 
 
 
b461971
2c6047b
b461971
2c6047b
c1fe742
2c6047b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
library_name: transformers
tags:
- llama-factory
- yi-vl
- llava
license: other
language:
- zh
- en
pipeline_tag: visual-question-answering
---

This is the Huggingface version of [Yi-VL-6B](https://huggingface.co/01-ai/Yi-VL-6B) model.

You may use this model for fine-tuning in downstream tasks, we recommend using our efficient fine-tuning toolkit. https://github.com/hiyouga/LLaMA-Factory

- **Developed by:** 01-AI.
- **Language(s) (NLP):** Chinese/English
- **License:** [Yi Series Model License](https://huggingface.co/01-ai/Yi-VL-34B/blob/main/LICENSE)

Usage:

```python
import requests
from PIL import Image

import torch
from transformers import AutoProcessor, AutoModelForVision2Seq

model_id = "BUAADreamer/Yi-VL-6B-hf"

messages = [
  { "role": "user", "content": "What's in the picture?" }
]

model = AutoModelForVision2Seq.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True, 
).to(0)
processor = AutoProcessor.from_pretrained(model_id)

text = [processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)]
images = [Image.open(requests.get(image_file, stream=True).raw)]
inputs = processor(text=prompt, images=images, return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=200)
output = processor.batch_decode(output, skip_special_tokens=True)
print(output.split("Assistant:")[-1].strip())
```

You could also alternatively launch a Web demo by using the CLI command in [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)

```bash
llamafactory-cli webchat \
--model_name_or_path BUAADreamer/Yi-VL-6B-hf \
--template yivl \
--visual_inputs
```

# [lmms-eval Evaluation Results](https://github.com/EvolvingLMMs-Lab/lmms-eval)

|             Metric              |Value|
|---------------------------------|----:|
| MMMU_val |36.8|
|CMMMU_val |32.2|