Checkpoints just with ViT-g Dimension (1408) for the Q-former (cross-att)?
#3
by
Daromog
- opened
All this Models use the dimension of ViT-g in the Q-Former(cross-att). Is there some place to get the checkpoints with the dimension of ViT-L
Hi,
Have the authors released checkpoints with a ViT-L vision backbone?
Hi,
Have the authors released checkpoints with a ViT-L vision backbone?
I found it here:
https://github.com/salesforce/LAVIS/pull/169/commits/25f86f65895f4142c18970ae11a57ae4dda2c7e2
Ok, so probably you can leverage this conversion script to convert them to the HF format: https://github.com/huggingface/transformers/blob/main/src/transformers/models/blip_2/convert_blip_2_original_to_pytorch.py