Alpaca Style Datasets Collection Datasets which follow the Alpaca Style format based on having 'instruction', 'input', and 'output' columns • 3601 items • Updated about 11 hours ago • 2
Probably function calling datasets Collection Created using the https://huggingface.co/spaces/librarian-bots/dataset-column-search-api Space. • 39 items • Updated Jul 17 • 36
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 193
view article Article Wikipedia's Treasure Trove: Advancing Machine Learning with Diverse Data By frimelle • Jun 3 • 13
haiku Collection 🌸 This is a collection of synthetic datasets built to help improve the ability of open language models to better write haikus through the use of DPO • 3 items • Updated Jun 21 • 6
Image dataset Collection 10 datasets showcase how to configure and load image datasets • 10 items • Updated Aug 2 • 4