Suzie Oh's picture

Suzie Oh PRO

ohsuz

·

ohsuz

AI & ML interests

None yet

Organizations

ohsuz's activity

upvoted an article 14 days ago

Article

Navigating Korean LLM Research #1: Models

By

•

14 days ago

• 18

upvoted a collection 14 days ago

Skywork-Reward-Data-Collection

Open-source preference datasets used to train the Skywork reward model series • 17 items • Updated 24 days ago • 8

upvoted a collection 22 days ago

MagpieLM

Aligning LMs with Fully Open Recipe (data+training configs+logs) • 9 items • Updated Sep 22 • 15

upvoted 4 collections 3 months ago

Direct Preference Optimization Datasets

Datasets suitable for Direct Preference Optimization based on their colum names • 1597 items • Updated Jul 10 • 2

Probably DPO datasets

A collection of datasets that probably support DPO • 146 items • Updated Jun 26 • 12

Datasets built with ⚗️ distilabel

This collection contains some datasets generated and/or labelled using https://github.com/argilla-io/distilabel • 7 items • Updated Aug 6 • 10

DPO Datasets

27 items • Updated Aug 15 • 3

upvoted 4 collections 4 months ago

Mini Pretrain Datasets

9 items • Updated Jul 9 • 9

Translated (En->Ko) dataset

Datasets translated from English to Korean using llama3-instrucTrans-enko-8b • 19 items • Updated Sep 20 • 3

Synthetic (text) Dataset Generation

Papers about synthetic dataset generation • 9 items • Updated Jun 21 • 6

synthetic-data-generation-demos

A collection of demos for various approaches to synthetic data generation • 4 items • Updated Jun 25 • 13

upvoted 2 collections 7 months ago

Model Merging

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 217

Korean Datasets I've released so far.

지금까지 업로드한 한국어 데이터셋 콜렉션입니다. • 8 items • Updated May 24 • 16