fit in 24gb?

by apol - opened Jun 20

Discussion

apol

Jun 20

Any plans to release a 2b or 1b version that is functional and can fit into a 4090?
Congrats on the release and bravo!

OrangeApples

Jun 21

•

edited Jun 21

A 4090 has 24GB of VRAM. Assuming want to offload the whole model to GPU, you can fit an IQ2_XS GGUF or a 2.24bpw EXL2 of this model into your 4090 with the 8K native context of Llama 3.

If you only want to run full precision models, you can easily fit a Llama 3 8B model into your VRAM. No need to go all the way down to 2B.

apol

Jun 23

thanks!

apol changed discussion status to closed Jun 23

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment