LLukas22/gpt4all-lora-quantized-ggjt · Is there a way to create a database for this model as its token limit is very short.

Apr 14, 2023

I'm new to this discussion, but the model works pretty good on my PC.
However, I noticed that when running, it would just shut down abruptly. After working with it a little more, I found that it was hitting the token limit, then automatically shutting down.

Is there a way to create a database where it can automatically access previous discussions to reference for current conversations, or is there a way to expand the token limitations? I've already asked it how to create a database, but the code it gave was full of bugs.

Overall, it works pretty good, so thumbs up!

LLukas22

Owner Apr 15, 2023

@Dannyboy55 The token limit of this model is 2048 token, which is similar to nearly all other LLaMA based models.

The only way of expanding the token limit is to use a bigger model but this significantly increases the computational requirements for inference.

A common way to artificially increase the token length is to check the length of a conversation up until now, and then prompt the model to summarize the conversation and use the summarized conversation instead of the actual conversation. An example of this can be found here.

Another approach would be to archive the conversations in a vector database and then retrieve them if a topic similar to them comes up again. But you will never be able to pass more than 2048 Tokens at once to the model.

If you want to generate code you probably shouldn't be using this model, maybe look into a finetuned code generation model like Salesforce/codegen.

LLukas22 changed discussion status to closed Apr 29, 2023