haider0941/distilbert-base-educationl

Model Details

Model Name: DistilBERT for Educational Query Classification
Model Architecture: DistilBERT (base model: distilbert-base-uncased)
Language: English
Model Type: Transformer-based text classification model
License: Apache License 2.0

Overview

This model is a fine-tuned version of DistilBERT specifically designed for classifying queries as either educational or non-educational. It was trained on a dataset containing a variety of questions and statements, with each entry labeled as either "educational" or "non-educational."

Intended Use

Primary Use Case: This model is intended to classify text inputs into two categories: "educational" or "non-educational." It is useful for applications that need to filter out or prioritize educational content.
Potential Applications:
- Educational chatbots or virtual assistants
- Content moderation for educational platforms
- Automated tagging of educational content
- Filtering non-educational queries from educational websites or apps

Training Data

Dataset: The model was fine-tuned on a custom educational dataset. This dataset includes various types of queries that are labeled based on their content as either "educational" or "non-educational."
Dataset Source: The dataset was manually curated to include a balanced mix of educational questions (covering various academic subjects) and non-educational questions (general queries that do not pertain to educational content).

Training Procedure

Framework: The model was trained using the Hugging Face Transformers library with PyTorch.
Fine-Tuning Parameters:
- Batch Size: 16
- Learning Rate: 5e-5
- Epochs: 3
- Optimizer: AdamW with weight decay
Hardware: Fine-tuning was performed on a single NVIDIA V100 GPU.

Limitations and Bias

While this model has been fine-tuned for classifying queries as educational or non-educational, there are some limitations and potential biases:

Bias in Data: The model may reflect any biases present in the training data, particularly if certain topics or types of educational content are overrepresented or underrepresented.
Binary Classification: The model categorizes inputs strictly as "educational" or "non-educational." It may not handle nuanced or ambiguous queries effectively.
Not Suitable for Other Classifications: This model is specifically designed for educational vs. non-educational classification. It may not perform well on other types of classification tasks without further fine-tuning.

How to Use

You can load the model using the Hugging Face Transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("haider0941/distilbert-base-educationl")
model = AutoModelForSequenceClassification.from_pretrained("haider0941/distilbert-base-educationl")

input_text = "What is the capital of France?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model(**inputs)

Citation

If you use this model, please cite it as follows:

@misc{Haider0941_2024,
  title={Fine-Tuned DistilBERT for Educational Query Classification},
  author={Haider},
  year={2024},
  howpublished={\url{https://huggingface.co/haider0941/distilbert-base-educationl}},
}

haider0941
/

distilbert-base-educationl