How much CPU ram will it take to load and for inference?
What is amount of CPU ram will I require to run this model on ubuntu with ram 8gb and swap 150gb?
Depends on the quantization you use. With 16-bit precision, which is the model in this repository, you would need about 240-260GB. Some people have good results with q2_k quantization that is about 50GB in size, about 55GB RAM use. Still, if you plan to use swap as your primary place for storing the weights during inference, you will have a really bad time running it. Basically, all of the weights need to be read once to generate one token. If you plan to use a SATA 6Gb/s SSD, which is likely given that you have 8GB of RAM, you are looking at read speed of about 500MB/s. Meaning that it would take you 100 seconds to read the weights once, so generating a sentence would take you about 1800 seconds - half an hour.