how is it used?
When entering the test an error appears.
@Codigoabierto It isn't a test for inferencing.
Will they place one to test the operation?
Will they place one to test the operation?
The model is 314 B. I don't think yet, as this is still a raw model. They need to quantise it, as the model is around 228 GB.
If someone creates a 4-bit quantized model of the 314B it would be around ~60gb which means it will still need around that much VRAM minimum right?
Considering that you need about 30-40GB VRAM to engage in inference with a 70b model.
If someone creates a 4-bit quantized model of the 314B it would be around ~60gb which means it will still need around that much VRAM minimum right?
Considering that you need about 30-40GB VRAM to engage in inference with a 70b model.
Yes, probably. Probably even more VRAM.