Post
2338
A non-Instruct LLM assistant is mostly useless. π§
Since it's mostly a model trained to complete text, when you ask it a question like "What to do during a stopover in Paris?", it can just go on and on adding more details to your question instead of answering, which would be valid to complete text from its training corpus, but not to answer questions.
β‘οΈ So the post-training stage includes an important Instruction tuning step where you teach your model how to be useful : answer questions, be concise, be polite... RLHF is a well known technique for this.
For people interested to understand how this step works, the folks at Adaptive ML have made a great guide!
Read it here π https://www.adaptive-ml.com/post/from-zero-to-ppo
Since it's mostly a model trained to complete text, when you ask it a question like "What to do during a stopover in Paris?", it can just go on and on adding more details to your question instead of answering, which would be valid to complete text from its training corpus, but not to answer questions.
β‘οΈ So the post-training stage includes an important Instruction tuning step where you teach your model how to be useful : answer questions, be concise, be polite... RLHF is a well known technique for this.
For people interested to understand how this step works, the folks at Adaptive ML have made a great guide!
Read it here π https://www.adaptive-ml.com/post/from-zero-to-ppo