Enforcing German model output
Hello,
Is there a way to enforce the model to answer questions in German? I am using this model as part of an RAG application with a knowledge base in German. And even though the user question and the retrieved context are both in German, sometimes the model still gives English answers. I tried various things via the system prompt, up to having the instruction for the answer in German after nearly every sentence in the prompt, which feels kinda crazy ;)
I concur that I am still a beginner when it comes to prompt engineering, so any tips are appreciated.
This is my prompt for reference, let me know if anyone needs more information.
system_prompt = (
"Du bist ein hilfsbereiter deutschsprachiger AI Assistent mit der Aufgabe, Fragen des users zu SEQIS Quality News Artikeln in deiner Wissensdatenbank auf Deutsch zu beantworten. "
" Verwende den nur Kontext aus der Wissendatenbank und kein vorheriges Wissen um die Fragen des users zu beantworten. "
"Wenn du die Antwort nicht weißt, antworte dass du die Antwort nicht weißt."
"Halte die Antwort kurz, maximal 3 Sätze. Antworte unbedingt auf Deutsch. \n\n"
"Kontext: {context}\n\n"
"{history}"
)
Hey,
@razzfazz-io
your prompt doesn't look that problematic, definitely not optimal, but that shouldn't cause the main problem.
Which model do you use exactly?
Quantified versions may tend to produce English output depending on the calibration dataset (e.g. English data).
However, with the 70b this should be almost impossible.
Which hyperparameters do you use?
The exact model I was using is the Q4_K_S quantization, from here: https://huggingface.co/redponike/Llama-3-SauerkrautLM-70b-Instruct-GGUF
As for hyperparameters, the only thing I touched was temperature, but that also just a few tries ago when I ran out of ideas. The model was instantiated using langchain's ChatOllama class, the temperature during most of my testing was 0, but I also tried 0.5 to see if that changed anything.
I also find it curious that the model seems to ignore the system prompt, or parts of it at least sometimes. For instance, I have the instruction for the answer length in the prompt to only answer in 3 sentences max. But ever so often the model creates much longer output than that. The output is not wrong in this case, but it doesn't seem to follow the rules laid out in the sys prompt.
I also looked at the input to the model using my langsmith trace, but I couldn't spot anything obvious.
If you want to check the last run I did yourself here is the link to the trace: https://eu.smith.langchain.com/public/b2adc336-7182-45cb-932a-40c0f480a2e5/r
Edit: I also updated to prompt with the help of ChatGPT to hopefully get better results, still testing at the moment.
I did some more testing, and the Q4_K_M variant seems to do better. Still not perfect though. Edit: Also, I tried the 3.1 llama variants, same quant, and they seem better overall, they just struggle with the eval framework I am using (giskard) a little.
I am asking myself if the way I pass the retrieved documents from my knowledgebase to the LLM is the problem. At the moment this is part of the system prompt, but maybe there is a better way to do that. I saw langchain includes a helper to pass retrieved documents to a model, so I will look into this as soon as I can.