VAGOsolutions
/

SauerkrautLM-1.5b

Text Generation

continuous pretraining

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

DavidGF commited on Jun 12

Commit

c96fb14

•

1 Parent(s): f60b280

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -78,12 +78,13 @@ The primary goal of this training was to demonstrate that with Spectrum CPT targ
 This method has an even more pronounced effect on larger models. It is feasible to teach a model a new language by training just a quarter of the available layers.
 The model has substantially improved German skills as demonstrated in RAG evaluations and numerous recognized benchmarks. In some English benchmarks, it even surpasses the Qwen2-1.5B-Instruct model.
 Stay tuned for the next big models employing Spectrum CPT!
 **NOTE**
-For the demo, we are satisfied with the performance of the model.
 For productive use, more German tokens can be trained on the SauerkrautLM-1.5b as required in order to teach the model even firmer German while only having a relative influence on the performance of the model (25% of the layers).
 The SauerkrautLM-1.5b offers an excellent starting point for this.

 This method has an even more pronounced effect on larger models. It is feasible to teach a model a new language by training just a quarter of the available layers.
 The model has substantially improved German skills as demonstrated in RAG evaluations and numerous recognized benchmarks. In some English benchmarks, it even surpasses the Qwen2-1.5B-Instruct model.
+**Spectrum CPT can efficiently teach a new language to a large language model (LLM) while preserving the majority of its previously acquired knowledge.**
 Stay tuned for the next big models employing Spectrum CPT!
 **NOTE**
+For the demo, the performance of the model is sufficient.
 For productive use, more German tokens can be trained on the SauerkrautLM-1.5b as required in order to teach the model even firmer German while only having a relative influence on the performance of the model (25% of the layers).
 The SauerkrautLM-1.5b offers an excellent starting point for this.