Edit model card

SentenceTransformer based on intfloat/multilingual-e5-small

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("srikarvar/multilingual-e5-small-pairclass-4")
# Run inference
sentences = [
    'What is the melting point of ice at sea level?',
    'What is the boiling point of water at sea level?',
    'Can you recommend a good restaurant nearby?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.93
cosine_accuracy_threshold 0.7887
cosine_f1 0.9238
cosine_f1_threshold 0.782
cosine_precision 0.8957
cosine_recall 0.9537
cosine_ap 0.9603
dot_accuracy 0.93
dot_accuracy_threshold 0.7887
dot_f1 0.9238
dot_f1_threshold 0.782
dot_precision 0.8957
dot_recall 0.9537
dot_ap 0.9603
manhattan_accuracy 0.9218
manhattan_accuracy_threshold 9.9367
manhattan_f1 0.9148
manhattan_f1_threshold 10.3162
manhattan_precision 0.887
manhattan_recall 0.9444
manhattan_ap 0.9579
euclidean_accuracy 0.93
euclidean_accuracy_threshold 0.6501
euclidean_f1 0.9238
euclidean_f1_threshold 0.6603
euclidean_precision 0.8957
euclidean_recall 0.9537
euclidean_ap 0.9603
max_accuracy 0.93
max_accuracy_threshold 9.9367
max_f1 0.9238
max_f1_threshold 10.3162
max_precision 0.8957
max_recall 0.9537
max_ap 0.9603

Binary Classification

Metric Value
cosine_accuracy 0.93
cosine_accuracy_threshold 0.7887
cosine_f1 0.9238
cosine_f1_threshold 0.782
cosine_precision 0.8957
cosine_recall 0.9537
cosine_ap 0.9603
dot_accuracy 0.93
dot_accuracy_threshold 0.7887
dot_f1 0.9238
dot_f1_threshold 0.782
dot_precision 0.8957
dot_recall 0.9537
dot_ap 0.9603
manhattan_accuracy 0.9218
manhattan_accuracy_threshold 9.9367
manhattan_f1 0.9148
manhattan_f1_threshold 10.3162
manhattan_precision 0.887
manhattan_recall 0.9444
manhattan_ap 0.9579
euclidean_accuracy 0.93
euclidean_accuracy_threshold 0.6501
euclidean_f1 0.9238
euclidean_f1_threshold 0.6603
euclidean_precision 0.8957
euclidean_recall 0.9537
euclidean_ap 0.9603
max_accuracy 0.93
max_accuracy_threshold 9.9367
max_f1 0.9238
max_f1_threshold 10.3162
max_precision 0.8957
max_recall 0.9537
max_ap 0.9603

Training Details

Training Dataset

Unnamed Dataset

  • Size: 971 training samples
  • Columns: sentence2, sentence1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence2 sentence1 label
    type string string int
    details
    • min: 4 tokens
    • mean: 10.12 tokens
    • max: 22 tokens
    • min: 6 tokens
    • mean: 10.82 tokens
    • max: 22 tokens
    • 0: ~48.61%
    • 1: ~51.39%
  • Samples:
    sentence2 sentence1 label
    Total number of bones in an adult human body How many bones are in the human body? 1
    What is the largest river in North America? What is the largest lake in North America? 0
    What is the capital of Australia? What is the capital of New Zealand? 0
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 243 evaluation samples
  • Columns: sentence2, sentence1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence2 sentence1 label
    type string string int
    details
    • min: 4 tokens
    • mean: 10.09 tokens
    • max: 20 tokens
    • min: 6 tokens
    • mean: 10.55 tokens
    • max: 22 tokens
    • 0: ~55.56%
    • 1: ~44.44%
  • Samples:
    sentence2 sentence1 label
    What are the various forms of renewable energy? What are the different types of renewable energy? 1
    Gravity discoverer Who discovered gravity? 1
    Can you help me write this report? Can you help me understand this report? 0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 2
  • learning_rate: 3e-06
  • weight_decay: 0.01
  • num_train_epochs: 15
  • lr_scheduler_type: reduce_lr_on_plateau
  • warmup_ratio: 0.1
  • load_best_model_at_end: True
  • optim: adamw_torch_fused

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • learning_rate: 3e-06
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: reduce_lr_on_plateau
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss pair-class-dev_max_ap pair-class-test_max_ap
0 0 - - 0.6426 -
0.6452 10 4.7075 - - -
0.9677 15 - 3.1481 0.7843 -
1.2903 20 3.431 - - -
1.9355 30 3.4054 - - -
2.0 31 - 2.1820 0.8692 -
2.5806 40 2.2735 - - -
2.9677 46 - 1.8185 0.9078 -
3.2258 50 2.3159 - - -
3.8710 60 2.1466 - - -
4.0 62 - 1.5769 0.9252 -
4.5161 70 1.6873 - - -
4.9677 77 - 1.4342 0.9310 -
5.1613 80 1.5927 - - -
5.8065 90 1.4184 - - -
6.0 93 - 1.3544 0.9357 -
6.4516 100 1.333 - - -
6.9677 108 - 1.2630 0.9402 -
7.0968 110 1.089 - - -
7.7419 120 1.0947 - - -
8.0 124 - 1.2120 0.9444 -
8.3871 130 0.8118 - - -
8.9677 139 - 1.1641 0.9454 -
9.0323 140 1.0237 - - -
9.6774 150 0.8406 - - -
10.0 155 - 1.0481 0.9464 -
10.3226 160 0.7081 - - -
10.9677 170 0.7397 0.9324 0.9509 -
11.6129 180 0.5604 - - -
12.0 186 - 0.8386 0.9556 -
12.2581 190 0.5841 - - -
12.9032 200 0.5463 - - -
12.9677 201 - 0.7930 0.9577 -
13.5484 210 0.4599 - - -
14.0 217 - 0.7564 0.9599 -
14.1935 220 0.2437 - - -
14.5161 225 - 0.7522 0.9603 0.9603
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
4
Safetensors
Model size
118M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for srikarvar/multilingual-e5-small-pairclass-4

Finetuned
(56)
this model

Evaluation results