wrong param name when using torch.load()
#9
by
BwShen
- opened
Sometimes I have to use torch.load()
to load model params without huggingface package. However, the param names are not the desired ones, e.g., h.23.mlp.dense_4h_to_h.weight
which should be transformer.h.23.mlp.dense_4h_to_h.weight
, and lm_head.weight
does not exist.
I guess it is related to #5 and #6 where the model architecture is changed, but the params are still BloomModel
instead of BloomForCausalLM
Thanks for noting 🧐 Maybe
@lewtun
knows what the problem is? Should we change it back to BloomForCausalLM
?