请问这个版本GPU内存消耗28G与14B对比如何?

by william0014 - opened Apr 6

Discussion

william0014

Apr 6

这个MoE版本, GPU显存消耗与14B一样, 但是却只有7B的能力, MoE这个作用是什么呢?

shing3232

Apr 6

因为他int4 moe 只有4G

bzheng

Qwen org Apr 7

•

edited Apr 7

推理速度是 7B 的 1.74 倍

william0014

Apr 7

推理速度是 7B 的 1.74 倍

但是效果只有7B, 和14B有一定差距, 它的显存消耗和14B一样, 显存占用和14B的接近,但是推理效果比14B差一点, 所以我比较疑惑这个版本主要使用场景是什么?

shing3232

Apr 9

推理速度是 7B 的 1.74 倍

但是效果只有7B, 和14B有一定差距, 它的显存消耗和14B一样, 显存占用和14B的接近,但是推理效果比14B差一点, 所以我比较疑惑这个版本主要使用场景是什么?

量化后的显存使用应该是 7B的感觉

everythingRequireConfirm

Apr 18

This comment has been hidden

JustinLin610

Qwen org Apr 24

This is about why we need MoE models. The model is large, but for each forward pass, the number of activated parameters is only 2.7B. This means that for the deployment, you can increase the throughput and and you can enjoy the benefits of acceleration.

jony4

May 13

https://qwenlm.github.io/zh/blog/qwen-moe/ 官方博文有解释

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment