potential of 405b model
#27
by
nskumar
- opened
Does it really producing the 128k contexts length?
i tried to do it with full context length, but it does not taking the 128k, it supports only 10,500 tokens. am i missing anything
It takes a ton of memory to do 128k for a 405b model. It is possible, but it would require a lot of GPUs and it would be slow
You could try using the fp8 version through NIM on DGX Cloud: https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8?dgx_inference=true