for faster GPU inference
#15
by
harithushan
- opened
can anyone provide a code for faster gpu inference from GPTQ model, for me it takes around 2 mins to get the response
I have the same problem :c
using exllama or exllama v2 should greatly help as its the fastest single user inference repository so far i believe. Also, using llama.cpp might help if youre gpu is really old or you want to split it to cpu as well