Multilabel Text Classification
SetFit supports multilabel classification, allowing multiple labels to be assigned to each instance.
Unless each instance must be assigned multiple outputs, you frequently do not need to specify a multi target strategy.
This guide will show you how to train and use multilabel SetFit models.
Multilabel strategies
SetFit will initialise a multilabel classification head from sklearn
- the following options are available for multi_target_strategy
:
"one-vs-rest"
: uses aOneVsRestClassifier
head."multi-output"
: uses aMultiOutputClassifier
head."classifier-chain"
: uses aClassifierChain
head.
See the scikit-learn documentation for multiclass and multioutput classification for more details.
Initializing SetFit models with multilabel strategies
Using the default LogisticRegression head, we can apply multi target strategies like so:
from setfit import SetFitModel
model = SetFitModel.from_pretrained(
model_id, # e.g. "BAAI/bge-small-en-v1.5"
multi_target_strategy="multi-output",
)
With a differentiable head it looks like so:
from setfit import SetFitModel
model = SetFitModel.from_pretrained(
model_id, # e.g. "BAAI/bge-small-en-v1.5"
multi_target_strategy="one-vs-rest"
use_differentiable_head=True,
head_params={"out_features": num_classes},
)