why mt0-large is 1.3B while mt5-large is 780M?
#6
by
tansq
- opened
why mt0-large is 1.3B while mt5-large is 780M?
Where did you get the 780M from? The pytorch weights file is the same size for both models, and the mt5 paper mentions the following:
"Following the original T5 recipe, we consider five model sizes: Small (≈ 300M parameters), Base (580M), Large (1.2B), XL (3.7B), and XXL (13B).