view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 β’ 166
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 β’ 66
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality Jun 24 β’ 33
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio Jul 10 β’ 24
view article Article How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o By chilijung β’ May 31 β’ 11
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data By davanstrien β’ May 23 β’ 15
view article Article π¦βοΈ Using Llama3 and distilabel to build fine-tuning datasets By dvilasuero β’ Jun 4 β’ 73