@AlekseyKorshuk on Hugging Face: "If you have to choose one small base language model <=3B for ChatML Code…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

AlekseyKorshuk

posted an update Jan 26

Post

If you have to choose one small base language model <=3B for ChatML Code Assistant (SFT+DPO) to validate the approach on the dataset and tune hyperparams, so later retrain with a larger base model like Mistral/Mixtral, what model would you pick?
🧵

euclaise

Jan 26

•

edited Jan 26

StableLM 3B benchmarks the best, although StableLM 2 1.6B and Qwen 1.8B crush it in GSM8K (albeit with more restrictive licenses).

For small tests I usually use falcon-rw-1b - permissive license, 1.3B params.

MiniMA 2 might be worth trying too - it's pruned from LLaMA, so you get the advantage of being compatible with LLaMA-based frameworks (although I had issues trying to get it to run in vLLM)

sbrandeis

Jan 26

Maybe StarCoder 2 when it gets released!

In this post

AlekseyKorshuk Aleksey Korshuk
euclaise Jade
sbrandeis Simon Brandeis