potential of 405b model

#27

by nskumar - opened Sep 6

Sep 6

Does it really producing the 128k contexts length?
i tried to do it with full context length, but it does not taking the 128k, it supports only 10,500 tokens. am i missing anything

nbroad

Sep 14

It takes a ton of memory to do 128k for a 405b model. It is possible, but it would require a lot of GPUs and it would be slow

nbroad

Sep 14

You could try using the fp8 version through NIM on DGX Cloud: https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8?dgx_inference=true

Cost for this is based on compute time.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment