arxiv:2306.02707

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Published on Jun 5, 2023

· Submitted by

akhaliq on Jun 5, 2023

#1 Paper of the day

Upvote

Authors:

Subhabrata Mukherjee ,

Arindam Mitra ,

Sahaj Agarwal ,

Abstract

Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from limited imitation signals from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model's capability as they tend to learn to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka.ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. To promote this progressive learning, we tap into large-scale and diverse imitation data with judicious sampling and selection. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval. Moreover, Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4. Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.

View arXiv page View PDF Add to collection

Community

BillGPT

Jun 6, 2023

This comment has been hidden

PY007

Jun 6, 2023

Will the generated dataset be open-sourced?

ehartford

Jun 6, 2023

dataset and model?

takeraparterer

Jun 8, 2023

pleabe give model 🥺🥺🥺🥺

samiaouad

Jun 9, 2023

This looks promising.

ehartford

Jun 10, 2023

If the dataset is released, my mission will be to train it on these base models:
Llama 7b, 33b, 65b
Falcon 7b, 40b
RedPajama 7b
OpenLLaMA 7b

yutsun

Jun 13, 2023

Release the ground plz~

Naugustogi

Jun 18, 2023

Why uploading it on huggingface if its not open source?

xiier

Jun 20, 2023

it's a fake project.

tarruda

Jun 20, 2023

Looking forward to the model weight public release. Any idea when it will happen? 15 days passed since the paper was released.

DHUGH

Jun 21, 2023

what is the base model for Orca?

tommytao

Jun 26, 2023

it's a fake project.

why fake?

thisiskeithkwan

Jun 29, 2023

looking forward for the model

wassname

Jul 6, 2023

please gab model

DHUGH

Jul 7, 2023

anybody hava any ideas about what LM learns from SFT??? Why SFT is efficitive???

tommytao

Jul 7, 2023

SFT makes a machine learn an alignment on keeping replying to received questions (i.e. answering) in a good way based on knowledge learned before during previous self-supervised learning, rather than just predicting the next word.

PS: SFT can also learn new knowledge like self-supervised learning, but I doubt such a mismatch between the LLM’s internal knowledge and the new knowledge mentioned may cause behavior cloning, and then may, unfortunately, cause additional hallucinations.

Ref:
https://huyenchip.com/2023/05/02/rlhf.html#rlhf_and_hallucination

danielpark

Aug 7, 2023

•

edited Aug 7, 2023

Has it been two months already since the paper was published? Are there any updates known to someone?
The code and full weights have not been disclosed. Where are updates and announcements being managed?