nice job!
Thank you for posting that! I'd say it's definitely more coherent when you look at the signs especially on the bottom right of the original one (weird stack of triple mini-signs, haha).
And there's a subtle detail / sharpness, too. Glad you're enjoying the model - and it's nice to have an issue that is not an "issue" / work for once, too! π€
PS: I always appreciate feedback - if you, in the future, find something that my CLIP is really BAD at - I'd love to hear it, too! Finding flaw in a model, if consistent (for many random seeds) is valuable feedback -> I can try and improve the model due to that feedback. Thanks! =)
Actually discussed it and follow-up discussed LoRA-style (PEFT) for CLIP-G, but I have serious doubt about it. It works when trying to make CLIP-G a "narrow AI" for a specific subject; but a generalizer that will be better than the original? I say: ALL WEIGHTS require gradient.
So, got some H100s around (or even a stack of A100s - VRAM is what matters)? I'll happily do it.
I'd also do it if I was rich (just pay for resource like I just don't care) - but open source & open weights means I am kinda down the wrong path to hope for that haha.
So - I'm hoping some students will make it their project, maybe, leveraging their university's GPU cluster for a CLIP BIG-G GmP. π
yup, fair - and about as expected :)