arxiv:2306.11644

Textbooks Are All You Need

Published on Jun 20, 2023

· Submitted by

akhaliq on Jun 21, 2023

#1 Paper of the day

Upvote

142

Authors:

Suriya Gunasekar ,

Jyoti Aneja ,

Allie Del Giorno ,

Sivakanth Gopi ,

Gustavo de Rosa ,

Shital Shah ,

Harkirat Singh Behl ,

Ronen Eldan ,

Yuanzhi Li

Abstract

We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.

View arXiv page View PDF Add to collection

Community

Sigmally

Jun 21, 2023

will the source code with dataset be published?

acheong08

Jun 21, 2023

Any chance of the models being published?

osanseviero

Jun 21, 2023

This comment has been hidden

Deadmon

Jun 21, 2023

I have been saying this from the start. I don't know why we were training LLMs on garbage conversations from humans.
We could train LLMs to be experts in kung fu and then install it in our brain with a 'neuralink' like the Matrix.