CLIP-ViT-B/32 on the eight image classication tasks
Collection
if you find these models helpful, consider citing [our paper](https://arxiv.org/abs/2406.03280)
•
9 items
•
Updated
Adam Optimizer with a constant learning rate 1e-5 for 4000 steps training (batch_size=32). Only the vision encoder is fine-tuned.
load vision model
from transformers import CLIPVisionModel
vision_model = CLIPVisionModel.from_pretrained('tanganke/clip-vit-base-patch32_stanford-cars')
substitute the vision encoder of clip
from transformers import CLIPModel
clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
clip_model.vision_model.load_state_dict(vision_model.vision_model.state_dict())
Base model
openai/clip-vit-base-patch32