This repository houses a fork of togethercomputer/LLaMA-2-7B-32K
's modeling_flash_llama.py
, with a fix for padding of attention weights merged into it.
This repository houses a fork of togethercomputer/LLaMA-2-7B-32K
's modeling_flash_llama.py
, with a fix for padding of attention weights merged into it.