调用代码如何写奥?
#1
by
sunjunlishi
- opened
尝试了源地址的代码,也尝试基本的多模态调用方式,都不能成功。
ValueError: Calling cuda()
is not supported for 4-bit
or 8-bit
quantized models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype
mark一下,同时问一下运行此4bit模型需要多大显存gpu?
24G显存;现在仿照原版的可以加载,可以推理,但是推理结果为空
推理结果为空,是不是量化有问题