Require flash attention2 for AIDC-AI/Ovis1.6-Llama3.2-3B model, please help

#2
by pawanc - opened

Require flash attention2 for AIDC-AI/Ovis1.6-Llama3.2-3B model, please help

AIDC-AI org

Could you specify the specific dilemma? Are you referring to performing inference without relying on flash attention?

I tried using it without the flash attention as currently I have CUDA 12.3 and when I downgrade it to 11.8 for flash attention for some reasons my GPU does not work. I believe we have to change the config.json where we have to set
llm_attn_implementation" to "eager", and also disable flash attention 2 by setting its value to false in modeling_ovis.py file in class ovis.
If there is any simpler way please do share.

Sign up or log in to comment