OpenGVLab/InternViT-6B-448px-V1-5 as Zero Shot Image Classification.
#4
by
iavinas
- opened
Hi,
Thanks for sharing the model.
I am trying to using Vision Foundation Model for a zero shot classification problem.
It is possible with OpenGVLab/InternVL-14B-224px but I am not able to do with OpenGVLab/InternViT-6B-448px-V1-5.
model = AutoModel.from_pretrained('OpenGVLab/InternViT-6B-448px-V1-5', torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('OpenGVLab/InternViT-6B-448px-V1-5', use_fast=False, add_eos_token=True, trust_remote_code=True)
Is there anyway to get the tokenizer for OpenGVLab/InternViT-6B-448px-V1-5?
+1
Hi, the difficulty you're experiencing arises from the fact that OpenGVLab/InternViT-6B-448px-V1-5 is designed primarily as a vision encoder for building multimodal large language models, not for zero-shot image classification. Therefore, it doesn't have the same functionality as OpenGVLab/InternVL-14B-224px, which is a CLIP-like model suitable for zero-shot image classification tasks.
czczup
changed discussion status to
closed