Edit model card

distily_bench_obj_cross_v2.1

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 299.6370
  • eval_frwikippl: 276510.875
  • eval_zhwikippl: 20158404.0
  • eval_tinystoriesppl: 10.3161
  • eval_loss: 1.2989
  • eval_runtime: 6.5099
  • eval_samples_per_second: 76.806
  • eval_steps_per_second: 9.678

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=0, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.004
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0568 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 28486.4824 70517.9062 6.3282 6.4922 77.016 9.704 14258.5928 39903.9922
500 0.6464 276.9369 227102.6875 1.3956 6.4832 77.123 9.717 11.3346 4592851.5
773 0.9994 299.6370 276510.875 1.2989 6.5099 76.806 9.678 10.3161 20158404.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
8
Safetensors
Model size
68.5M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_obj_cross_v2.1

Quantized
(10)
this model