nayohan (Yohan Na)

upvoted a paper 18 days ago

Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations

Paper • 2310.13420 • Published Oct 20, 2023 • 2

upvoted 2 papers 26 days ago

Personalized Visual Instruction Tuning

Paper • 2410.07113 • Published 27 days ago • 69

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published 28 days ago • 107

upvoted a collection 4 months ago

Korean-English Parallel Datasets (한국어-영어 병렬 데이터셋)

Collection

6 items • Updated Jul 17 • 3

upvoted 2 papers 4 months ago

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 33

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 73

upvoted 3 collections 4 months ago

upvoted 3 collections 5 months ago

Korean Pretraining Dataset

Collection

15 items • Updated Jul 22 • 10

Standard-format-preference-dataset

Collection

We collect the open-source datasets and process them into the standard format. • 14 items • Updated May 8 • 21

Domain Specific (Math, Code, etc)

Collection

23 items • Updated Aug 6 • 1

upvoted a paper 6 months ago

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 73

upvoted a paper 9 months ago

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Paper • 2401.01335 • Published Jan 2 • 64

upvoted 6 papers about 1 year ago

AlpaGasus: Training A Better Alpaca with Fewer Data

Paper • 2307.08701 • Published Jul 17, 2023 • 22

Large Language Models as Analogical Reasoners

Paper • 2310.01714 • Published Oct 3, 2023 • 15

Efficient Streaming Language Models with Attention Sinks

Paper • 2309.17453 • Published Sep 29, 2023 • 13

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Paper • 2309.12307 • Published Sep 21, 2023 • 87

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 19

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention

Paper • 2309.14327 • Published Sep 25, 2023 • 21

Yohan Na

AI & ML interests

Organizations

nayohan's activity

Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations

Personalized Visual Instruction Tuning

Aria: An Open Multimodal Native Mixture-of-Experts Model

Korean-English Parallel Datasets (한국어-영어 병렬 데이터셋)

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Better & Faster Large Language Models via Multi-token Prediction

Text datasets with missing language information

Awesome feedback datasets

Translated (En->Ko) dataset

Korean Pretraining Dataset

Standard-format-preference-dataset

Domain Specific (Math, Code, etc)

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

AlpaGasus: Training A Better Alpaca with Fewer Data

Large Language Models as Analogical Reasoners

Efficient Streaming Language Models with Attention Sinks

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Small-scale proxies for large-scale Transformer training instabilities

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention