Do not find weights for classifier head (a Liner layer) in 'pytorch_model.bin'
#6
by
ExicitingMe
- opened
I printed out all state dict and didn't find the weights for classifier head. I'm wondering why is it the case.
model.transformer.wte.weight: [50304, 2048]
model.transformer.blocks.0.attn_out.weight: [2048, 2048]
model.transformer.blocks.0.ff_out.weight: [2048, 8192]
model.transformer.blocks.0.att_proj.weight: [6144, 2048]
model.transformer.blocks.0.ff_proj.weight: [16384, 2048]
model.transformer.blocks.1.attn_out.weight: [2048, 2048]
model.transformer.blocks.1.ff_out.weight: [2048, 8192]
model.transformer.blocks.1.att_proj.weight: [6144, 2048]
model.transformer.blocks.1.ff_proj.weight: [16384, 2048]
model.transformer.blocks.2.attn_out.weight: [2048, 2048]
model.transformer.blocks.2.ff_out.weight: [2048, 8192]
model.transformer.blocks.2.att_proj.weight: [6144, 2048]
model.transformer.blocks.2.ff_proj.weight: [16384, 2048]
model.transformer.blocks.3.attn_out.weight: [2048, 2048]
model.transformer.blocks.3.ff_out.weight: [2048, 8192]
model.transformer.blocks.3.att_proj.weight: [6144, 2048]
model.transformer.blocks.3.ff_proj.weight: [16384, 2048]
model.transformer.blocks.4.attn_out.weight: [2048, 2048]
model.transformer.blocks.4.ff_out.weight: [2048, 8192]
model.transformer.blocks.4.att_proj.weight: [6144, 2048]
model.transformer.blocks.4.ff_proj.weight: [16384, 2048]
model.transformer.blocks.5.attn_out.weight: [2048, 2048]
model.transformer.blocks.5.ff_out.weight: [2048, 8192]
model.transformer.blocks.5.att_proj.weight: [6144, 2048]
model.transformer.blocks.5.ff_proj.weight: [16384, 2048]
model.transformer.blocks.6.attn_out.weight: [2048, 2048]
model.transformer.blocks.6.ff_out.weight: [2048, 8192]
model.transformer.blocks.6.att_proj.weight: [6144, 2048]
model.transformer.blocks.6.ff_proj.weight: [16384, 2048]
model.transformer.blocks.7.attn_out.weight: [2048, 2048]
model.transformer.blocks.7.ff_out.weight: [2048, 8192]
model.transformer.blocks.7.att_proj.weight: [6144, 2048]
model.transformer.blocks.7.ff_proj.weight: [16384, 2048]
model.transformer.blocks.8.attn_out.weight: [2048, 2048]
model.transformer.blocks.8.ff_out.weight: [2048, 8192]
model.transformer.blocks.8.att_proj.weight: [6144, 2048]
model.transformer.blocks.8.ff_proj.weight: [16384, 2048]
model.transformer.blocks.9.attn_out.weight: [2048, 2048]
model.transformer.blocks.9.ff_out.weight: [2048, 8192]
model.transformer.blocks.9.att_proj.weight: [6144, 2048]
model.transformer.blocks.9.ff_proj.weight: [16384, 2048]
model.transformer.blocks.10.attn_out.weight: [2048, 2048]
model.transformer.blocks.10.ff_out.weight: [2048, 8192]
model.transformer.blocks.10.att_proj.weight: [6144, 2048]
model.transformer.blocks.10.ff_proj.weight: [16384, 2048]
model.transformer.blocks.11.attn_out.weight: [2048, 2048]
model.transformer.blocks.11.ff_out.weight: [2048, 8192]
model.transformer.blocks.11.att_proj.weight: [6144, 2048]
model.transformer.blocks.11.ff_proj.weight: [16384, 2048]
model.transformer.blocks.12.attn_out.weight: [2048, 2048]
model.transformer.blocks.12.ff_out.weight: [2048, 8192]
model.transformer.blocks.12.att_proj.weight: [6144, 2048]
model.transformer.blocks.12.ff_proj.weight: [16384, 2048]
model.transformer.blocks.13.attn_out.weight: [2048, 2048]
model.transformer.blocks.13.ff_out.weight: [2048, 8192]
model.transformer.blocks.13.att_proj.weight: [6144, 2048]
model.transformer.blocks.13.ff_proj.weight: [16384, 2048]
model.transformer.blocks.14.attn_out.weight: [2048, 2048]
model.transformer.blocks.14.ff_out.weight: [2048, 8192]
model.transformer.blocks.14.att_proj.weight: [6144, 2048]
model.transformer.blocks.14.ff_proj.weight: [16384, 2048]
model.transformer.blocks.15.attn_out.weight: [2048, 2048]
model.transformer.blocks.15.ff_out.weight: [2048, 8192]
model.transformer.blocks.15.att_proj.weight: [6144, 2048]
model.transformer.blocks.15.ff_proj.weight: [16384, 2048]
The model is using weight tying: it shares its embedding layer weight with the classification head.
I got it, thx.
ExicitingMe
changed discussion status to
closed