Hi,
As the model originally support flash-attention, I was wondering how the encoding speed would vary with two different acceleration strategy?
Best
· Sign up or log in to comment