SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: BAAI/bge-small-en-v1.5
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 4 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
none	'I’ll need to think it over to elaborate on this question.' 'I think I will go to Disneyland.' 'I missed part of that; could you please rephrase it for me?'
wrapup_question	"That's all for now in regards to this question" "Do you have any other issues you'd like me to address?" 'Do you have any other questions related to this topic?'
end_question	"let's do some other more meaningful question" "I think I've covered everything I needed to for this question" 'Ok, I am done answering this question'
next_question	'Can you please provide me a different question?' "I've given that question a lot of thought. What's next?" "I hope I answered your question to your satisfaction. What's the next one?"

Evaluation

Metrics

Label	Accuracy
all	0.9054

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("nksk/Intent_bge-small-en-v1.5_v5.0")
# Run inference
preds = model("Let me revisit something you mentioned earlier.")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	1	38.7075	1048

Label	Training Sample Count
end_question	56
next_question	30
none	157
wrapup_question	51

Training Hyperparameters

batch_size: (32, 16)
num_epochs: (3, 10)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.0005
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: True
use_amp: True
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0006	1	0.2718	-
0.0290	50	0.2554	-
0.0580	100	0.2373	-
0.0870	150	0.2127	-
0.1160	200	0.1728	-
0.1450	250	0.1301	-
0.1740	300	0.0944	-
0.2030	350	0.0591	-
0.2320	400	0.0393	-
0.2610	450	0.0217	-
0.2900	500	0.013	-
0.3190	550	0.0111	-
0.3480	600	0.006	-
0.3770	650	0.0047	-
0.4060	700	0.0035	-
0.4350	750	0.004	-
0.4640	800	0.0022	-
0.4930	850	0.0019	-
0.5220	900	0.0017	-
0.5510	950	0.0014	-
0.5800	1000	0.0013	-
0.6090	1050	0.0013	-
0.6381	1100	0.0012	-
0.6671	1150	0.0011	-
0.6961	1200	0.001	-
0.7251	1250	0.0009	-
0.7541	1300	0.0009	-
0.7831	1350	0.0009	-
0.8121	1400	0.0008	-
0.8411	1450	0.0008	-
0.8701	1500	0.0008	-
0.8991	1550	0.0007	-
0.9281	1600	0.0008	-
0.9571	1650	0.0007	-
0.9861	1700	0.0007	-
1.0151	1750	0.0007	-
1.0441	1800	0.0006	-
1.0731	1850	0.0006	-
1.1021	1900	0.0006	-
1.1311	1950	0.0006	-
1.1601	2000	0.0006	-
1.1891	2050	0.0006	-
1.2181	2100	0.0006	-
1.2471	2150	0.0006	-
1.2761	2200	0.0005	-
1.3051	2250	0.0005	-
1.3341	2300	0.0005	-
1.3631	2350	0.0005	-
1.3921	2400	0.0005	-
1.4211	2450	0.0005	-
1.4501	2500	0.0005	-
1.4791	2550	0.0005	-
1.5081	2600	0.0005	-
1.5371	2650	0.0004	-
1.5661	2700	0.0005	-
1.5951	2750	0.0005	-
1.6241	2800	0.0004	-
1.6531	2850	0.0004	-
1.6821	2900	0.0004	-
1.7111	2950	0.0004	-
1.7401	3000	0.0004	-
1.7691	3050	0.0004	-
1.7981	3100	0.0004	-
1.8271	3150	0.0004	-
1.8561	3200	0.0004	-
1.8852	3250	0.0004	-
1.9142	3300	0.0004	-
1.9432	3350	0.0004	-
1.9722	3400	0.0004	-
2.0012	3450	0.0004	-
2.0302	3500	0.0003	-
2.0592	3550	0.0004	-
2.0882	3600	0.0004	-
2.1172	3650	0.0004	-
2.1462	3700	0.0004	-
2.1752	3750	0.0004	-
2.2042	3800	0.0004	-
2.2332	3850	0.0003	-
2.2622	3900	0.0003	-
2.2912	3950	0.0003	-
2.3202	4000	0.0003	-
2.3492	4050	0.0003	-
2.3782	4100	0.0003	-
2.4072	4150	0.0003	-
2.4362	4200	0.0003	-
2.4652	4250	0.0003	-
2.4942	4300	0.0003	-
2.5232	4350	0.0003	-
2.5522	4400	0.0003	-
2.5812	4450	0.0003	-
2.6102	4500	0.0003	-
2.6392	4550	0.0003	-
2.6682	4600	0.0003	-
2.6972	4650	0.0003	-
2.7262	4700	0.0003	-
2.7552	4750	0.0003	-
2.7842	4800	0.0003	-
2.8132	4850	0.0003	-
2.8422	4900	0.0003	-
2.8712	4950	0.0003	-
2.9002	5000	0.0003	-
2.9292	5050	0.0003	-
2.9582	5100	0.0003	-
2.9872	5150	0.0003	-

Framework Versions

Python: 3.10.12
SetFit: 1.1.0
Sentence Transformers: 3.0.1
Transformers: 4.44.2
PyTorch: 2.5.0+cu121
Datasets: 3.0.2
Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

nksk
/

Intent_bge-small-en-v1.5_v5.0