llama2 weights used?
#1
by
KnutJaegersberg
- opened
Wondering if you used llama2 weights or only the llama model architecture.
Does it have grouped query attention? It's a huge deal as it saves a ton of context related memory.
We only used the Llama architecture and Mistral weight. For more details, please check out the paper at https://huggingface.co/papers/2312.15166. 😊
hunkim
changed discussion status to
closed