How does it work?

#1
by radames - opened

Very cool integration! Just to confirm if I understood, you get all masks with SAM, i.e no prompt based, then you run MetaCLIP on each masked area and rank with given prompt?

Very cool integration! Just to confirm if I understood, you get all masks with SAM, i.e no prompt based

Correct!

then you run MetaCLIP on each masked area and rank with given prompt

I compare [f"a picture of {prompt}", "a picture of background"]. If your prompt scores higher than the background, I count it as part of the searched prompt. This is probably not optimal :) I'm open for suggestions :)

Sign up or log in to comment