Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
davanstrienΒ 
posted an update Jun 17
Post
2211
πŸ“βœ¨ Meet Corpus Creator!

This Gradio app ( davanstrien/corpus-creator) takes you from your local files to a Hugging Face Dataset via Llama Index.

The goal of the tool is to make it quicker and easier to quickly get some local files you want to get ready for ML tasks into a Hugging Face Dataset. Perfect for building datasets for:
- synthetic data pipelines
- annotation
- RAG
- Other ML tasks that start from a HF dataset

I'll share something more substantial that uses this tomorrow πŸ€—
In this post