Edit model card

palmer-004

palmer turbo

This model has a slightly different architecture and training style:

  1. The model was followed by a continual pretraining (lm_head + embedding layers were tuned).
  2. Base model was pretrained on 75k instruction/response pairs and merged.
  3. Similar architecture than palmer series but smaller in context size (8192)

In short, palmer is now half the size, twice the speed and almost same overall performance with a notable improvement on mmlu and arc challenge instead of winogrande. As of Wed 17 Jul, it beats all models =< 0.5b on hellaswag.

As all palmer models, the model is biased to respond to answers without using any specific prompt, feel free to further fine-tune it for your specific use case.

benchmarks

These are zero-shot evaluations performed on current state-of-the-art language models.

Model MMLU ARC-C HellaSwag PIQA Winogrande Average
smollm-360m 0.2537 0.3626 0.5350 0.7116 0.5659 0.4858
tinyllama 0.2577 0.3029 0.5935 0.7329 0.5959 0.4966
qwen2-0.5b 0.4413 0.2892 0.4905 0.6931 0.5699 0.4968
danube3-500m-chat (current sota) 0.2554 0.3626 0.6072 0.7432 0.6140 0.5164
palmer-004-turbo 0.2736 0.3558 0.6179 0.7367 0.6117 0.5191
palmer-004 0.2661 0.3490 0.6173 0.7481 0.6417 0.5244

thanks to

  • h2oai: performant base model provider
  • teknium: openhermes dataset provider
  • unsloth: tooling for training software
Downloads last month
27
Safetensors
Model size
514M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.