Speed comparison between implementation with Flash-Attention and Xformer

#3
by le723z - opened

Hi,

As the model originally support flash-attention, I was wondering how the encoding speed would vary with two different acceleration strategy?

Best

Sign up or log in to comment