Indic Datasets List of text and voice datasets to train and finetune Indic LLMs ai4bharat/sangraha Viewer • Updated 16 days ago • 268M • 12.9k • 28 uonlp/CulturaX Viewer • Updated Jul 23 • 7.18B • 19.2k • 474 pary/hind_encorp Updated Jan 18 • 225 • 1 PleIAs/YouTube-Commons Updated Jun 26 • 1.02k • 315
Alignment Dataset English and other model alignment datasets. H-D-T/Buzz-8b-Large-v0.5 Text Generation • Updated May 14 • 33 • 29 allenai/WildChat-1M Viewer • Updated 19 days ago • 838k • 1.45k • 279 nvidia/ChatQA-Training-Data Viewer • Updated Jun 4 • 442k • 1.2k • 159 nvidia/ChatRAG-Bench Viewer • Updated May 24 • 34.6k • 1.38k • 97