Configurations:4 A100 80G GPUsuse int8 with bitsandbytes
My inference time ranges from second to 100 seconds, does this make sense to you?
It is about 160seconds for 500 tokens
Β· Sign up or log in to comment