---
language:
- en
library_name: sentence-transformers
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- loss:MatryoshkaLoss
- loss:CoSENTLoss
base_model: distilbert/distilbert-base-uncased
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
widget:
- source_sentence: The gate is yellow.
sentences:
- The gate is blue.
- The person is starting a fire.
- A woman is bungee jumping.
- source_sentence: A plane in the sky.
sentences:
- Two airplanes in the sky.
- A man is standing in the rain.
- There are two men near a wall.
- source_sentence: A woman is reading.
sentences:
- A woman is writing something.
- A woman is applying eye shadow.
- A dog and a red ball in the air.
- source_sentence: A baby is laughing.
sentences:
- The baby laughed in his car seat.
- Suicide bomber strikes in Syria
- Bangladesh Islamist execution upheld
- source_sentence: A woman is dancing.
sentences:
- A woman is dancing in railway station.
- The flag was moving in the air.
- three dogs growling On one another
pipeline_tag: sentence-similarity
co2_eq_emissions:
emissions: 7.871164130493101
energy_consumed: 0.020249867843471606
source: codecarbon
training_type: fine-tuning
on_cloud: false
cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
ram_total_size: 31.777088165283203
hours_used: 0.112
hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
- name: SentenceTransformer based on distilbert/distilbert-base-uncased
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev 768
type: sts-dev-768
metrics:
- type: pearson_cosine
value: 0.8647737221000229
name: Pearson Cosine
- type: spearman_cosine
value: 0.8747521728687471
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8627734228763478
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8657556253211545
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.862712112144467
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8657615257280495
name: Spearman Euclidean
- type: pearson_dot
value: 0.7442745641899206
name: Pearson Dot
- type: spearman_dot
value: 0.7513830366520415
name: Spearman Dot
- type: pearson_max
value: 0.8647737221000229
name: Pearson Max
- type: spearman_max
value: 0.8747521728687471
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev 512
type: sts-dev-512
metrics:
- type: pearson_cosine
value: 0.8628378541768764
name: Pearson Cosine
- type: spearman_cosine
value: 0.8741345340758229
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8619744745534216
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8651450292937584
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8622841683977804
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8653280682431165
name: Spearman Euclidean
- type: pearson_dot
value: 0.746359236761633
name: Pearson Dot
- type: spearman_dot
value: 0.7540849763868891
name: Spearman Dot
- type: pearson_max
value: 0.8628378541768764
name: Pearson Max
- type: spearman_max
value: 0.8741345340758229
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev 256
type: sts-dev-256
metrics:
- type: pearson_cosine
value: 0.8588975886507025
name: Pearson Cosine
- type: spearman_cosine
value: 0.8714341050301952
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8590790006287132
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8634123185807864
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8591861535833625
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8628587088112977
name: Spearman Euclidean
- type: pearson_dot
value: 0.7185871795192371
name: Pearson Dot
- type: spearman_dot
value: 0.7288595287151053
name: Spearman Dot
- type: pearson_max
value: 0.8591861535833625
name: Pearson Max
- type: spearman_max
value: 0.8714341050301952
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev 128
type: sts-dev-128
metrics:
- type: pearson_cosine
value: 0.8528583626543365
name: Pearson Cosine
- type: spearman_cosine
value: 0.8687502864484896
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8509433708242649
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.857615159782176
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8531616082767298
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8580823134153918
name: Spearman Euclidean
- type: pearson_dot
value: 0.697019210549756
name: Pearson Dot
- type: spearman_dot
value: 0.705924438927243
name: Spearman Dot
- type: pearson_max
value: 0.8531616082767298
name: Pearson Max
- type: spearman_max
value: 0.8687502864484896
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev 64
type: sts-dev-64
metrics:
- type: pearson_cosine
value: 0.8340115410608493
name: Pearson Cosine
- type: spearman_cosine
value: 0.858682843519445
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8351566362279711
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8445869885309296
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.838674217877368
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8460894143343873
name: Spearman Euclidean
- type: pearson_dot
value: 0.6579249229659768
name: Pearson Dot
- type: spearman_dot
value: 0.6712615573330701
name: Spearman Dot
- type: pearson_max
value: 0.838674217877368
name: Pearson Max
- type: spearman_max
value: 0.858682843519445
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 768
type: sts-test-768
metrics:
- type: pearson_cosine
value: 0.833720870548252
name: Pearson Cosine
- type: spearman_cosine
value: 0.8469501140979906
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8484755252691695
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8470024066861298
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8492651445573072
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8475238481800537
name: Spearman Euclidean
- type: pearson_dot
value: 0.6701649984837568
name: Pearson Dot
- type: spearman_dot
value: 0.6526285131648061
name: Spearman Dot
- type: pearson_max
value: 0.8492651445573072
name: Pearson Max
- type: spearman_max
value: 0.8475238481800537
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 512
type: sts-test-512
metrics:
- type: pearson_cosine
value: 0.8325595554355977
name: Pearson Cosine
- type: spearman_cosine
value: 0.8467500241650668
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8474378528408064
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8462571021525837
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.848182316243596
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8466275072216626
name: Spearman Euclidean
- type: pearson_dot
value: 0.6736686039338646
name: Pearson Dot
- type: spearman_dot
value: 0.6572299516736647
name: Spearman Dot
- type: pearson_max
value: 0.848182316243596
name: Pearson Max
- type: spearman_max
value: 0.8467500241650668
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 256
type: sts-test-256
metrics:
- type: pearson_cosine
value: 0.8225923032714455
name: Pearson Cosine
- type: spearman_cosine
value: 0.8403145699624681
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8420998942805191
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8419520394692916
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8434867831513
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8428522494561291
name: Spearman Euclidean
- type: pearson_dot
value: 0.6230179114374444
name: Pearson Dot
- type: spearman_dot
value: 0.6061595939729718
name: Spearman Dot
- type: pearson_max
value: 0.8434867831513
name: Pearson Max
- type: spearman_max
value: 0.8428522494561291
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 128
type: sts-test-128
metrics:
- type: pearson_cosine
value: 0.8149976807930366
name: Pearson Cosine
- type: spearman_cosine
value: 0.8349547446101432
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8351661617446753
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8360899024374612
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8375785243041524
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8375574347771609
name: Spearman Euclidean
- type: pearson_dot
value: 0.5958381414366161
name: Pearson Dot
- type: spearman_dot
value: 0.5793444545861678
name: Spearman Dot
- type: pearson_max
value: 0.8375785243041524
name: Pearson Max
- type: spearman_max
value: 0.8375574347771609
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 64
type: sts-test-64
metrics:
- type: pearson_cosine
value: 0.7981336004264228
name: Pearson Cosine
- type: spearman_cosine
value: 0.8269913105115189
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8238799955007295
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8289121477853545
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8278657744625194
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8314643517951371
name: Spearman Euclidean
- type: pearson_dot
value: 0.5206433480609991
name: Pearson Dot
- type: spearman_dot
value: 0.5067194535547845
name: Spearman Dot
- type: pearson_max
value: 0.8278657744625194
name: Pearson Max
- type: spearman_max
value: 0.8314643517951371
name: Spearman Max
---
# SentenceTransformer based on distilbert/distilbert-base-uncased
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb)
- **Language:** en
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/distilbert-base-uncased-sts-matryoshka")
# Run inference
sentences = [
'A woman is dancing.',
'A woman is dancing in railway station.',
'The flag was moving in the air.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Semantic Similarity
* Dataset: `sts-dev-768`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8648 |
| **spearman_cosine** | **0.8748** |
| pearson_manhattan | 0.8628 |
| spearman_manhattan | 0.8658 |
| pearson_euclidean | 0.8627 |
| spearman_euclidean | 0.8658 |
| pearson_dot | 0.7443 |
| spearman_dot | 0.7514 |
| pearson_max | 0.8648 |
| spearman_max | 0.8748 |
#### Semantic Similarity
* Dataset: `sts-dev-512`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8628 |
| **spearman_cosine** | **0.8741** |
| pearson_manhattan | 0.862 |
| spearman_manhattan | 0.8651 |
| pearson_euclidean | 0.8623 |
| spearman_euclidean | 0.8653 |
| pearson_dot | 0.7464 |
| spearman_dot | 0.7541 |
| pearson_max | 0.8628 |
| spearman_max | 0.8741 |
#### Semantic Similarity
* Dataset: `sts-dev-256`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8589 |
| **spearman_cosine** | **0.8714** |
| pearson_manhattan | 0.8591 |
| spearman_manhattan | 0.8634 |
| pearson_euclidean | 0.8592 |
| spearman_euclidean | 0.8629 |
| pearson_dot | 0.7186 |
| spearman_dot | 0.7289 |
| pearson_max | 0.8592 |
| spearman_max | 0.8714 |
#### Semantic Similarity
* Dataset: `sts-dev-128`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8529 |
| **spearman_cosine** | **0.8688** |
| pearson_manhattan | 0.8509 |
| spearman_manhattan | 0.8576 |
| pearson_euclidean | 0.8532 |
| spearman_euclidean | 0.8581 |
| pearson_dot | 0.697 |
| spearman_dot | 0.7059 |
| pearson_max | 0.8532 |
| spearman_max | 0.8688 |
#### Semantic Similarity
* Dataset: `sts-dev-64`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.834 |
| **spearman_cosine** | **0.8587** |
| pearson_manhattan | 0.8352 |
| spearman_manhattan | 0.8446 |
| pearson_euclidean | 0.8387 |
| spearman_euclidean | 0.8461 |
| pearson_dot | 0.6579 |
| spearman_dot | 0.6713 |
| pearson_max | 0.8387 |
| spearman_max | 0.8587 |
#### Semantic Similarity
* Dataset: `sts-test-768`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:----------|
| pearson_cosine | 0.8337 |
| **spearman_cosine** | **0.847** |
| pearson_manhattan | 0.8485 |
| spearman_manhattan | 0.847 |
| pearson_euclidean | 0.8493 |
| spearman_euclidean | 0.8475 |
| pearson_dot | 0.6702 |
| spearman_dot | 0.6526 |
| pearson_max | 0.8493 |
| spearman_max | 0.8475 |
#### Semantic Similarity
* Dataset: `sts-test-512`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8326 |
| **spearman_cosine** | **0.8468** |
| pearson_manhattan | 0.8474 |
| spearman_manhattan | 0.8463 |
| pearson_euclidean | 0.8482 |
| spearman_euclidean | 0.8466 |
| pearson_dot | 0.6737 |
| spearman_dot | 0.6572 |
| pearson_max | 0.8482 |
| spearman_max | 0.8468 |
#### Semantic Similarity
* Dataset: `sts-test-256`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8226 |
| **spearman_cosine** | **0.8403** |
| pearson_manhattan | 0.8421 |
| spearman_manhattan | 0.842 |
| pearson_euclidean | 0.8435 |
| spearman_euclidean | 0.8429 |
| pearson_dot | 0.623 |
| spearman_dot | 0.6062 |
| pearson_max | 0.8435 |
| spearman_max | 0.8429 |
#### Semantic Similarity
* Dataset: `sts-test-128`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:----------|
| pearson_cosine | 0.815 |
| **spearman_cosine** | **0.835** |
| pearson_manhattan | 0.8352 |
| spearman_manhattan | 0.8361 |
| pearson_euclidean | 0.8376 |
| spearman_euclidean | 0.8376 |
| pearson_dot | 0.5958 |
| spearman_dot | 0.5793 |
| pearson_max | 0.8376 |
| spearman_max | 0.8376 |
#### Semantic Similarity
* Dataset: `sts-test-64`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:----------|
| pearson_cosine | 0.7981 |
| **spearman_cosine** | **0.827** |
| pearson_manhattan | 0.8239 |
| spearman_manhattan | 0.8289 |
| pearson_euclidean | 0.8279 |
| spearman_euclidean | 0.8315 |
| pearson_dot | 0.5206 |
| spearman_dot | 0.5067 |
| pearson_max | 0.8279 |
| spearman_max | 0.8315 |
## Training Details
### Training Dataset
#### sentence-transformers/stsb
* Dataset: [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb) at [ab7a5ac](https://huggingface.co/datasets/sentence-transformers/stsb/tree/ab7a5ac0e35aa22088bdcf23e7fd99b220e53308)
* Size: 5,749 training samples
* Columns: sentence1
, sentence2
, and score
* Approximate statistics based on the first 1000 samples:
| | sentence1 | sentence2 | score |
|:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------|
| type | string | string | float |
| details |
A plane is taking off.
| An air plane is taking off.
| 1.0
|
| A man is playing a large flute.
| A man is playing a flute.
| 0.76
|
| A man is spreading shreded cheese on a pizza.
| A man is spreading shredded cheese on an uncooked pizza.
| 0.76
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "CoSENTLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Evaluation Dataset
#### sentence-transformers/stsb
* Dataset: [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb) at [ab7a5ac](https://huggingface.co/datasets/sentence-transformers/stsb/tree/ab7a5ac0e35aa22088bdcf23e7fd99b220e53308)
* Size: 1,500 evaluation samples
* Columns: sentence1
, sentence2
, and score
* Approximate statistics based on the first 1000 samples:
| | sentence1 | sentence2 | score |
|:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
| type | string | string | float |
| details | A man with a hard hat is dancing.
| A man wearing a hard hat is dancing.
| 1.0
|
| A young child is riding a horse.
| A child is riding a horse.
| 0.95
|
| A man is feeding a mouse to a snake.
| The man is feeding a mouse to the snake.
| 1.0
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "CoSENTLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `num_train_epochs`: 4
- `warmup_ratio`: 0.1
- `fp16`: True
#### All Hyperparameters