Andrea Soria

asoria

AI & ML interests

Maintainer of πŸ€—Datasets: Data processing

Recent Activity

Articles

Organizations

asoria's activity

upvoted an article 2 days ago
upvoted 3 articles about 1 month ago
view article
Article

LoRA training scripts of the world, unite!

β€’ 45
view article
Article

Improving Parquet Dedupe on Hugging Face Hub

β€’ 30
upvoted 4 articles about 2 months ago
view article
Article

Introducing BERTopic Integration with Hugging Face Hub

β€’ 7
view article
Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

β€’ 166
view article
Article

Introducing the SQL Console on Datasets

β€’ 18
upvoted an article 2 months ago
view article
Article

Fine-Tuning Gemma Models in Hugging Face

β€’ 23
upvoted 2 articles 3 months ago
view article
Article

The 5 Most Under-Rated Tools on Hugging Face

β€’ 85
view article
Article

SmolLM - blazingly fast and remarkably powerful

β€’ 265
upvoted 4 articles 4 months ago
view article
Article

Docmatix - a huge dataset for Document Visual Question Answering

β€’ 67
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

β€’ 66
view article
Article

Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality

β€’ 33
view article
Article

Experimenting with Automatic PII Detection on the Hub using Presidio

β€’ 24
upvoted 2 articles 5 months ago
view article
Article

Announcing New Dataset Search Features

β€’ 22
view article
Article

How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o

By chilijung β€’
β€’ 11
upvoted 2 articles 6 months ago
view article
Article

Synthetic dataset generation techniques: generating custom sentence similarity data

By davanstrien β€’
β€’ 15
view article
Article

Synthetic data: save money, time and carbon with open source

β€’ 50
upvoted an article 7 months ago
view article
Article

πŸ¦™βš—οΈ Using Llama3 and distilabel to build fine-tuning datasets

By dvilasuero β€’
β€’ 73