llama2 weights used?

#1
by KnutJaegersberg - opened

Wondering if you used llama2 weights or only the llama model architecture.

Does it have grouped query attention? It's a huge deal as it saves a ton of context related memory.

upstage org

We only used the Llama architecture and Mistral weight. For more details, please check out the paper at https://huggingface.co/papers/2312.15166. 😊

hunkim changed discussion status to closed

Sign up or log in to comment