Video-Text-to-Text
Transformers
Safetensors
English
llava
text-generation
multimodal
Eval Results
Inference Endpoints

Is this a newer/better model than OneVision?

#1
by ehayes-haiper - opened
LMMs-Lab org

Yes. In terms of video. It is a video specific model

Thanks! Is inference the same as llava-OneVision? I.e. all the same tokens, dimensions etc?

LMMs-Lab org

Almost the same.

Sign up or log in to comment