jzsues
/

llava-qwen1.5-4b-chat

Visual Question Answering

text-generation

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Model

llava-qwen1.5-4b-chat is a lightweight multimodal models base on LLaVA architecture.

Language Model: Qwen/Qwen1.5-4B-Chat
Vision Encoder: google/siglip-so400m-patch14-384
Total Paramters: 4,388,102,720

Evaluation

MMBench

Model	MMBench Test (EN)	MMBench Dev (EN)	MMBench Test (CN)	MMBench Dev (CN)	CCBench Dev
LLaVA-v1.5-7B	67.7	69.2	61.0	59.7	28.4
LLaVA-InternLM-7B	69.0	68.5	66.7	63.8	37.3
LLaVA-InternLM2-7B	73.3	74.6	71.7	72.0	42.5
Bunny-3B	69.2	68.6	-	-	-
MiniCPM-V	64.1	67.9	62.6	65.3	41.4
llava-qwen1.5-4b-chat	69.6	69.2	68.6	68.3	41.0

Uses

TBD

Training Details

TBD

Downloads last month: 21

Safetensors

Model size

4.39B params

Tensor type

BF16

·

Inference Examples

Visual Question Answering

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train jzsues/llava-qwen1.5-4b-chat