Does Mistral support accelerate library?
#65
by
Sp1der
- opened
When I try to use Mistral with the following parameters:
--fsdp 'full_shard auto_wrap'
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'
It reports error:
Exception: Could not find the transformer layer class to wrap in the model.
How can I use Mistral with accelerate library?
Facing same issue
Try 'MistralDecoderLayer' instead.
Looked into Transformers repo, and found that
For Llama: https://github.com/huggingface/transformers/blob/cc3e4781854a52cf090ffde28d884a527dab6708/src/transformers/models/llama/modeling_llama.py#L625
For Mistral: https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L580C2-L580C2