Datasets - a kaizuberbuehler Collection

kaizuberbuehler 's Collections

Image Generation

Vision Language Models

Foundation Models

Synthetic Data and Self-Improvement

Agents

Video Generation

LM Prompt Engineering

LM Capabilities and Scaling

Music Generation

LM Architectures

Code Generation

Speech Synthesis

EXL2 Quantized Models

Datasets

updated Sep 20

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 30
CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1 • 15
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Paper • 2406.08707 • Published Jun 13 • 15
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 48
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Paper • 2406.08973 • Published Jun 13 • 85
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Paper • 2406.08418 • Published Jun 12 • 28
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Paper • 2406.08451 • Published Jun 12 • 23
argilla/magpie-ultra-v0.1

Viewer • Updated about 1 month ago • 50k • 456 • 210
HuggingFaceFW/fineweb

Viewer • Updated Jul 16 • 46B • 373k • 1.74k
wikimedia/wikipedia

Viewer • Updated Jan 9 • 61.6M • 59.7k • 583
HuggingFaceTB/cosmopedia

Viewer • Updated Aug 12 • 31.1M • 10.7k • 561
bigcode/the-stack

Viewer • Updated Apr 13, 2023 • 546M • 8.05k • 736
teknium/OpenHermes-2.5

Viewer • Updated Apr 15 • 1M • 5.27k • 680
roneneldan/TinyStories

Viewer • Updated Aug 12 • 2.14M • 12.5k • 559
Vezora/Open-Critic-GPT

Viewer • Updated Jul 28 • 55.1k • 139 • 88
HuggingFaceFW/fineweb-edu

Viewer • Updated 29 days ago • 3B • 555k • 527
arcee-ai/The-Tome

Viewer • Updated Aug 15 • 1.75M • 233 • 78
mlabonne/FineTome-100k

Viewer • Updated Jul 29 • 100k • 9.26k • 114
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19 • 47