transformers-like-implementation
#1
by
Leyo
- opened
Use the siglip implementation from hf SiglipModel + add flash-attn 2 + get the model.safetensor from google/siglip-so400m-patch14-384
Leyo
changed pull request status to
open
not that it matters but is there a reason to use nn.init.normal_
instead of nn.init.xavier_uniform_
?
thank you for this, looks good!
Because with nn.init.xavier_uniform_ I would get ValueError("Fan in and fan out can not be computed for tensor with fewer than 2 dimensions"). I think this is due to ds zero3, but since I was not getting it for the nn.init.normal, and we use a pretrained checkpoint, I thought it was simpler to just switch to normal.
Alternatively I can use a context manager or get rid of it all together
ok!
Leyo
changed pull request status to
merged