Post
2400
⚗️ Find reusable synthetic data pipeline code and corresponding datasets on the
@huggingface
Hub.
Find your pipline and use
Some components I used
- Embedded dataset viewer https://huggingface.co/docs/hub/main/en/datasets-viewer-embed
- Hugging Face fsspec https://huggingface.co/docs/huggingface_hub/main/en/guides/hf_file_system
- distilabel https://distilabel.argilla.io/latest/
- Gradio leaderboard by Freddy Boulton freddyaboulton/gradio_leaderboard
- Gradio modal by Ali Abid
Space: davidberenstein1957/distilabel-synthetic-data-pipeline-explorer
Find your pipline and use
$ distilabel pipeline run --config "hugging_face_dataset_url/pipeline.yaml"
Some components I used
- Embedded dataset viewer https://huggingface.co/docs/hub/main/en/datasets-viewer-embed
- Hugging Face fsspec https://huggingface.co/docs/huggingface_hub/main/en/guides/hf_file_system
- distilabel https://distilabel.argilla.io/latest/
- Gradio leaderboard by Freddy Boulton freddyaboulton/gradio_leaderboard
- Gradio modal by Ali Abid
Space: davidberenstein1957/distilabel-synthetic-data-pipeline-explorer