compatible with Llama

#29

by cArlIcon - opened Nov 15, 2023

base: refs/heads/main

←

from: refs/pr/29

Discussion Files changed

+287

-1697

cArlIcon

01-ai org Nov 15, 2023

No description provided.

compatible with Llamafb27ecc0

richardllin changed pull request status to open Nov 21, 2023

richardllin changed pull request status to merged Nov 21, 2023

rodrigo-nogueira

Nov 21, 2023

Yi-34B's generation became 10x slower on 4xA10 GPUs after replacing YiForCausalLM with LlamaForCausalLM.
Any idea why?

xianbao

Nov 22, 2023

Hi @rodrigo-nogueira not sure what's the root cause, but do you want to give Flash Attention a try by invoking the model with use_flash_attention_2=True?

More context can be found from:
https://huggingface.co/docs/transformers/v4.35.2/en/perf_infer_gpu_one#Flash-Attention-2

rodrigo-nogueira

Nov 22, 2023

•

edited Nov 22, 2023

Thank you very much, it is much faster now.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment