lapp0's picture
End of training
6b1d380 verified
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2
    results: []

distily_bench_obj_cross_v2

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 5868.0605
  • eval_frwikippl: 32990.6758
  • eval_zhwikippl: 54785.7930
  • eval_tinystoriesppl: 2293.1941
  • eval_loss: 4.9180
  • eval_runtime: 13.0935
  • eval_samples_per_second: 76.374
  • eval_steps_per_second: 9.547

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=2.0, loss_fn=mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0004
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine_with_restarts
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.1729 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 34961.5352 67685.8906 6.4082 13.0484 76.638 9.58 22307.2852 64899.9219
1000 0.0808 5926.0762 32934.9414 4.9183 13.0602 76.568 9.571 2332.1941 55049.4961
2000 0.1616 5852.6240 32990.6758 4.9180 13.0475 76.643 9.58 2287.5139 54785.7930
3000 0.2424 5843.5669 32990.6758 4.9177 13.045 76.658 9.582 2281.4705 54785.7930
4000 0.3232 5878.9780 32990.6758 4.9180 13.0627 76.554 9.569 2303.0730 54815.0078
5000 0.4040 5868.0605 32990.6758 4.9180 13.0226 76.789 9.599 2295.0898 54815.0078
6000 0.4848 5867.1484 32990.6758 4.9180 13.0139 76.841 9.605 2291.6780 54785.7930
7000 0.5657 5869.8799 32990.6758 4.9177 13.0183 76.815 9.602 2297.7485 54815.0078
8000 0.6465 5868.0605 32990.6758 4.9180 13.084 76.429 9.554 2294.3315 54815.0078
9000 0.7273 5868.0605 32990.6758 4.9180 13.0935 76.374 9.547 2293.1941 54785.7930
10000 0.8081 5845.3784 32990.6758 4.9177 13.0045 76.896 9.612 2282.6021 54785.7930
11000 0.8889 5848.9976 32990.6758 4.9177 13.0015 76.914 9.614 2284.8682 54785.7930
12000 0.9697 5868.0605 32990.6758 4.9183 13.0386 76.696 9.587 2296.9883 54815.0078
12375 1.0 5868.0605 32990.6758 4.9183 13.0038 76.9 9.613 2296.9883 54815.0078

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0