Edit model card

gte_hun

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the train dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • train
  • Language: hu
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("karsar/bge-m3-hu")
# Run inference
sentences = [
    'Az emberek alszanak.',
    'Egy apa és a fia ölelgeti alvás közben.',
    'Egy csoport ember ül egy nyitott, térszerű területen, mögötte nagy bokrok és egy sor viktoriánus stílusú épület, melyek közül sokat a kép jobb oldalán lévő erős elmosódás tesz kivehetetlenné.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.979
dot_accuracy 0.021
manhattan_accuracy 0.9804
euclidean_accuracy 0.979
max_accuracy 0.9804

Triplet

Metric Value
cosine_accuracy 0.979
dot_accuracy 0.021
manhattan_accuracy 0.9804
euclidean_accuracy 0.979
max_accuracy 0.9804

Training Details

Training Dataset

train

  • Dataset: train
  • Size: 200,000 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 11.73 tokens
    • max: 56 tokens
    • min: 6 tokens
    • mean: 15.24 tokens
    • max: 47 tokens
    • min: 7 tokens
    • mean: 16.07 tokens
    • max: 53 tokens
  • Samples:
    anchor positive negative
    Egy lóháton ülő ember átugrik egy lerombolt repülőgép felett. Egy ember a szabadban, lóháton. Egy ember egy étteremben van, és omlettet rendel.
    Gyerekek mosolyogva és integetett a kamera Gyermekek vannak jelen A gyerekek homlokot rántanak
    Egy fiú ugrál a gördeszkát a közepén egy piros híd. A fiú gördeszkás trükköt csinál. A fiú korcsolyázik a járdán.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

train

  • Dataset: train
  • Size: 5,000 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 11.73 tokens
    • max: 56 tokens
    • min: 6 tokens
    • mean: 15.24 tokens
    • max: 47 tokens
    • min: 7 tokens
    • mean: 16.07 tokens
    • max: 53 tokens
  • Samples:
    anchor positive negative
    Egy lóháton ülő ember átugrik egy lerombolt repülőgép felett. Egy ember a szabadban, lóháton. Egy ember egy étteremben van, és omlettet rendel.
    Gyerekek mosolyogva és integetett a kamera Gyermekek vannak jelen A gyerekek homlokot rántanak
    Egy fiú ugrál a gördeszkát a közepén egy piros híd. A fiú gördeszkás trükköt csinál. A fiú korcsolyázik a járdán.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss train loss all-nli-dev_max_accuracy all-nli-test_max_accuracy
0 0 - - 0.7176 -
0.008 100 1.0753 - - -
0.016 200 0.7611 - - -
0.024 300 1.0113 - - -
0.032 400 0.6224 - - -
0.04 500 0.8465 0.6159 0.8938 -
0.048 600 0.7761 - - -
0.056 700 0.8738 - - -
0.064 800 0.9393 - - -
0.072 900 0.9743 - - -
0.08 1000 0.8445 0.4556 0.8916 -
0.088 1100 0.7237 - - -
0.096 1200 0.8064 - - -
0.104 1300 0.607 - - -
0.112 1400 0.7632 - - -
0.12 1500 0.7477 1.6880 0.6748 -
0.128 1600 1.018 - - -
0.136 1700 0.9046 - - -
0.144 1800 0.728 - - -
0.152 1900 0.7219 - - -
0.16 2000 0.632 0.6459 0.8622 -
0.168 2100 0.6067 - - -
0.176 2200 0.7267 - - -
0.184 2300 0.781 - - -
0.192 2400 0.662 - - -
0.2 2500 0.6192 1.0124 0.8328 -
0.208 2600 0.7943 - - -
0.216 2700 0.8762 - - -
0.224 2800 0.7913 - - -
0.232 2900 0.8049 - - -
0.24 3000 0.858 0.6378 0.8046 -
0.248 3100 0.679 - - -
0.256 3200 0.7213 - - -
0.264 3300 0.6028 - - -
0.272 3400 0.5778 - - -
0.28 3500 0.5434 0.6784 0.8496 -
0.288 3600 0.6726 - - -
0.296 3700 0.7347 - - -
0.304 3800 0.8413 - - -
0.312 3900 0.7993 - - -
0.32 4000 0.8899 0.7732 0.8092 -
0.328 4100 1.1505 - - -
0.336 4200 0.8871 - - -
0.344 4300 0.8423 - - -
0.352 4400 0.8288 - - -
0.36 4500 0.6728 0.6341 0.8436 -
0.368 4600 0.7534 - - -
0.376 4700 0.8276 - - -
0.384 4800 0.7677 - - -
0.392 4900 0.588 - - -
0.4 5000 0.7742 0.4389 0.8808 -
0.408 5100 0.6782 - - -
0.416 5200 0.6688 - - -
0.424 5300 0.5579 - - -
0.432 5400 0.6891 - - -
0.44 5500 0.5764 0.4192 0.902 -
0.448 5600 0.6152 - - -
0.456 5700 0.6864 - - -
0.464 5800 0.6429 - - -
0.472 5900 0.9379 - - -
0.48 6000 0.7607 0.4744 0.8736 -
0.488 6100 0.819 - - -
0.496 6200 0.6316 - - -
0.504 6300 0.8175 - - -
0.512 6400 0.8485 - - -
0.52 6500 0.5374 0.4860 0.916 -
0.528 6600 0.781 - - -
0.536 6700 0.7722 - - -
0.544 6800 0.7281 - - -
0.552 6900 0.8453 - - -
0.56 7000 0.8541 0.2612 0.9322 -
0.568 7100 0.9698 - - -
0.576 7200 0.7184 - - -
0.584 7300 0.699 - - -
0.592 7400 0.5574 - - -
0.6 7500 0.5374 0.1939 0.9472 -
0.608 7600 0.6485 - - -
0.616 7700 0.5177 - - -
0.624 7800 0.814 - - -
0.632 7900 0.6442 - - -
0.64 8000 0.5301 0.1192 0.9616 -
0.648 8100 0.4948 - - -
0.656 8200 0.426 - - -
0.664 8300 0.4781 - - -
0.672 8400 0.4188 - - -
0.68 8500 0.5695 0.1523 0.9492 -
0.688 8600 0.3895 - - -
0.696 8700 0.5041 - - -
0.704 8800 0.7599 - - -
0.712 8900 0.5893 - - -
0.72 9000 0.6678 0.1363 0.9588 -
0.728 9100 0.5917 - - -
0.736 9200 0.6201 - - -
0.744 9300 0.5072 - - -
0.752 9400 0.4233 - - -
0.76 9500 0.396 0.2490 0.937 -
0.768 9600 0.3699 - - -
0.776 9700 0.3734 - - -
0.784 9800 0.4145 - - -
0.792 9900 0.4422 - - -
0.8 10000 0.4427 0.1394 0.9634 -
0.808 10100 0.678 - - -
0.816 10200 0.6771 - - -
0.824 10300 0.8249 - - -
0.832 10400 0.5003 - - -
0.84 10500 0.5586 0.1006 0.9726 -
0.848 10600 0.4649 - - -
0.856 10700 0.5322 - - -
0.864 10800 0.4837 - - -
0.872 10900 0.5717 - - -
0.88 11000 0.4403 0.1009 0.9688 -
0.888 11100 0.5044 - - -
0.896 11200 0.4771 - - -
0.904 11300 0.4426 - - -
0.912 11400 0.3705 - - -
0.92 11500 0.4445 0.0992 0.978 -
0.928 11600 0.3707 - - -
0.936 11700 0.4322 - - -
0.944 11800 0.4619 - - -
0.952 11900 0.4772 - - -
0.96 12000 0.5756 0.0950 0.9804 -
0.968 12100 0.5649 - - -
0.976 12200 0.5037 - - -
0.984 12300 0.0317 - - -
0.992 12400 0.0001 - - -
1.0 12500 0.0001 0.0948 0.9804 0.9804

Framework Versions

  • Python: 3.11.8
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.0
  • PyTorch: 2.3.0.post101
  • Accelerate: 0.33.0
  • Datasets: 2.18.0
  • Tokenizers: 0.19.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
12
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for karsar/bge-m3-hu

Base model

BAAI/bge-m3
Finetuned
(105)
this model

Evaluation results