Nice work
The openness claim caught my attention so I added this instruction-tuned model to our openness leaderboard. Thanks for documenting the Dutch data (through Geitje) and the fine-tuning / DPO datasets. The current model comes out halfway the openness leaderboard. Being based on Mistral inevitably hurts overall openness, as Mistral is notoriously closed in terms of pretraining data, documentation and scientific scrutiny. If there's any source code repo I missed I'd be happy to link to it, in my first go I wasn't able to locate code.
(If you want to go for more openness, starting from AllenAI's Olmo 7B might be a good bet!)
@markding Training code is fully reproducible with recipe and the alignment-handbook, so yes - training code is available. https://huggingface.co/BramVanroy/GEITje-7B-ultra#training-procedure