Papers
arxiv:2305.13673

Physics of Language Models: Part 1, Context-Free Grammar

Published on May 23, 2023
Authors:

Abstract

We design controlled experiments to study HOW generative language models, like GPT, learn context-free grammars (CFGs) -- diverse language systems with a tree-like structure capturing many aspects of natural languages, programs, and logics. CFGs are as hard as pushdown automata, and can be ambiguous so that verifying if a string satisfies the rules requires dynamic programming. We construct synthetic data and demonstrate that even for difficult (long and ambiguous) CFGs, pre-trained transformers can learn to generate sentences with near-perfect accuracy and impressive diversity. More importantly, we delve into the physical principles behind how transformers learns CFGs. We discover that the hidden states within the transformer implicitly and precisely encode the CFG structure (such as putting tree node information exactly on the subtree boundary), and learn to form "boundary to boundary" attentions resembling dynamic programming. We also cover some extension of CFGs as well as the robustness aspect of transformers against grammar mistakes. Overall, our research provides a comprehensive and empirical understanding of how transformers learn CFGs, and reveals the physical mechanisms utilized by transformers to capture the structure and rules of languages.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2305.13673 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2305.13673 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2305.13673 in a Space README.md to link it from this page.

Collections including this paper 2