An NLP model that predicts subreddit based on the title of a post.
Training
DistilBERT is fine-tuned on subreddit-posts, a dataset of titles of the top 1000 posts from the top 250 subreddits.
For steps to make the model check out the model notebook in the github repo or open in Colab.
Limitations and bias
- Since the model is trained on top 250 subreddits (for reference) therefore it can only categorise within those subreddits.
- Some subreddits have a specific format for their post title, like r/todayilearned where post title starts with "TIL" so the model becomes biased towards "TIL" --> r/todayilearned. This can be removed by cleaning the dataset of these specific terms.
- In some subreddit like r/gifs, the title of the post doesn't matter much, so the model performs poorly on them.
- Downloads last month
- 49
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.