Hub documentation

Libraries

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Libraries

The Datasets Hub has support for several libraries in the Open Source ecosystem. Thanks to the huggingface_hub Python library, it’s easy to enable sharing your datasets on the Hub. We’re happy to welcome to the Hub a set of Open Source libraries that are pushing Machine Learning forward.

The table below summarizes the supported libraries and their level of integration.

Library Description Download from Hub Push to Hub
Argilla Collaboration tool for AI engineers and domain experts that value high quality data. βœ… βœ…
Dask Parallel and distributed computing library that scales the existing Python and PyData ecosystem. βœ… βœ…
Datasets πŸ€— Datasets is a library for accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP). βœ… βœ…
Distilabel The framework for synthetic data generation and AI feedback. βœ… βœ…
DuckDB In-process SQL OLAP database management system. βœ… βœ…
FiftyOne FiftyOne is a library for curation and visualization of image, video, and 3D data. βœ… βœ…
Pandas Python data analysis toolkit. βœ… βœ…
Polars A DataFrame library on top of an OLAP query engine. βœ… βœ…
Spark Real-time, large-scale data processing tool in a distributed environment. βœ… βœ…
WebDataset Library to write I/O pipelines for large datasets. βœ… ❌
< > Update on GitHub