fp16 or bf16 version?

#6
by xiangli - opened

Hi, is there a float16 or bfloat16 version? The fp32 model takes too much memory , and the code is customized specifically for fp32, not easy to infer in fp16 or bf16.

We have adjusted to code for work with bfloat16, although note I have seen this change the model's output a bit.

We have adjusted to code for work with bfloat16, although note I have seen this change the model's output a bit.

What kind of VRAM requirements are there for this model + fp32 as well as bf16? Am already blown away by the 7B but curious to interact with the 72B.

Sign up or log in to comment