Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.01954

Papers - Multilingual - Benchmarks

HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2 • 19

Papers - Fine-tuning - PPO

HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2 • 19
UltraFeedback: Boosting Language Models with High-quality Feedback

Paper • 2310.01377 • Published Oct 2, 2023 • 5
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Paper • 2305.14387 • Published May 22, 2023 • 1

Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 35
HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2 • 19

Papers - Text - Supervised Fine-tuning - Batch Grouping

Batches are grouped by similar token length to help optimize gpu/hardware. Mini batch lengths are different but the max number of tokens is the same.

HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2 • 19

Papers - Text - Supervised Fine-tuning

HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2 • 19

Papers - Pre-training - Dynamic Context Length

For HyperClova X they split 90% at 4096 and 10% at 32k context length during pt

HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2 • 19

Papers - Pre-training - In-filling - PSM and SPM ordering

HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2 • 19

non-english llm

RakutenAI-7B: Extending Large Language Models for Japanese

Paper • 2403.15484 • Published Mar 21 • 12
HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2 • 19

Model Stock: All we need is just a few fine-tuned models

Paper • 2403.19522 • Published Mar 28 • 10
HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2 • 19
Instruction Tuning with Human Curriculum

Paper • 2310.09518 • Published Oct 14, 2023 • 3

Papers - Reward Model - Bradley-Terry

https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture24.pdf

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 48
HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2 • 19
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15 • 11
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15 • 82

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs