Speed

by ramzeez88 - opened Oct 13, 2023

Oct 13, 2023

I am wondering what kind of speed can i expect from this model if i split the layers between cpu and gpu. i have 4 cores/8threads cpu , 16gb ram , and nvidia 1070ti 8gb vram.

dimanchkek

Oct 13, 2023

I am wondering what kind of speed can i expect from this model if i split the layers between cpu and gpu. i have 4 cores/8threads cpu , 16gb ram , and nvidia 1070ti 8gb vram.

It would be amazing if 16GB of RAM is enough for this model. Even if your PC doesn't hang, the output speed will still be very slow due to the use of swap file

TheBloke

Owner Oct 14, 2023

Yeah 16GB is going to be tight. But it should be possible - offload 8GB of layers to GPU, and then the smaller quants, eg Q3_K_M, will use around 10GB RAM, so it should just fit.

Speed is not going to be great on account of your CPU and GPU both being weak and old (1070Ti is very old now) - expect it to be very slow. But it hopefully won't swap.

Maybe 1 token a second, or 2 tokens a second? Something like that.

John0007

Feb 23

•

edited Feb 23

What is really crazy i tested various 34B models with 20gigs of ram and RTX 2060 super,with various layers offloading percentage.And my cpu is just a Ryzen 5 2600x with 12threads and i get about 4-5 tokens per second which is not fast but also not so bad on such hardware,and i also using 4 bits Q4_K_S.What you think?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment