Model
llava-qwen1.5-4b-chat is a lightweight multimodal models base on LLaVA architecture.
- Language Model: Qwen/Qwen1.5-4B-Chat
- Vision Encoder: google/siglip-so400m-patch14-384
- Total Paramters: 4,388,102,720
Evaluation
MMBench
Model | MMBench Test (EN) | MMBench Dev (EN) | MMBench Test (CN) | MMBench Dev (CN) | CCBench Dev |
---|---|---|---|---|---|
LLaVA-v1.5-7B | 67.7 | 69.2 | 61.0 | 59.7 | 28.4 |
LLaVA-InternLM-7B | 69.0 | 68.5 | 66.7 | 63.8 | 37.3 |
LLaVA-InternLM2-7B | 73.3 | 74.6 | 71.7 | 72.0 | 42.5 |
Bunny-3B | 69.2 | 68.6 | - | - | - |
MiniCPM-V | 64.1 | 67.9 | 62.6 | 65.3 | 41.4 |
llava-qwen1.5-4b-chat | 69.6 | 69.2 | 68.6 | 68.3 | 41.0 |
Uses
TBD
Training Details
TBD
- Downloads last month
- 21
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.