Text Generation
Transformers
PyTorch
English
llama
causal-lm
text-generation-inference
Inference Endpoints

Models answers incoherent.

#1
by Kernel - opened

I tried to run the model on a A6000 GPU using official FastChat repo. I could not get a single coherent response, model throws a garbage and stats hallucinating. Any advise on how to run the model?

I forgot to put the prompt template in the README for this repo:

### Human: prompt goes here
### Assistant:

This model really needs the right template else it might return nothing, or not a good response. Try again with the above

Even with the prompt, the answers are completely random for me as well. I'm running it through text-generation-webui, if you think that might be making a difference.

I updated text-generation-webui to the latest version, set the model type to llama, and set it to load in 8bits, and now it's working! Sharing here in case others are having similar issues.

Sign up or log in to comment