Fifty shapes of BLiMP: syntactic learning curves in LMs
Collection
Models analyzed in our 2024 MILLing paper: https://aclanthology.org/2024.clasp-1.7/
•
4 items
•
Updated
This autoregressive model belongs to a series of rather small language models trained on the BabyLM data:
baby_llama | teenie_llama | weenie_llama | tweenie_llama | |
---|---|---|---|---|
Parameters | 2.97M | 2.97M | 11.44M | 11.44M |
hidden layers | 8 | 8 | 16 | 16 |
Attention heads | 8 | 8 | 16 | 16 |
Embedding size | 128 | 128 | 256 | 256 |
Context size | 128 | 128 | 256 | 256 |
Vocab size | 16k | 16k | 16k | 16k |
If you use this model in your research, please cite the following publication:
@inproceedings{bunzeck-zarriess-2024-fifty,
title = "Fifty shapes of {BL}i{MP}: syntactic learning curves in language models are not uniform, but sometimes unruly",
author = "Bunzeck, Bastian and
Zarrie{\ss}, Sina",
editor = "Qiu, Amy and
Noble, Bill and
Pagmar, David and
Maraev, Vladislav and
Ilinykh, Nikolai",
booktitle = "Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning",
month = oct,
year = "2024",
address = "Gothenburg, Sweden",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.clasp-1.7",
pages = "39--55",
}