synthetic-data - a leonardlin Collection

leonardlin 's Collections

speed

sota

evals

tuning

rag

context

safety

image

vision

code

prompt injection

TOREAD

data

voice

synthetic-data

updated May 21

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29 • 48
Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11 • 29
WizardLM: Empowering Large Language Models to Follow Complex Instructions

Paper • 2304.12244 • Published Apr 24, 2023 • 13
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20 • 47
Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 142
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 70
Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Paper • 2306.02707 • Published Jun 5, 2023 • 46
WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Paper • 2306.08568 • Published Jun 14, 2023 • 28
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Paper • 2310.13961 • Published Oct 21, 2023 • 4