Edit model card

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("NickyNicky/bge-base-financial-matryoshka")
# Run inference
sentences = [
    'For the fiscal year ended August 26, 2023, we reported net sales of $17.5 billion compared with $16.3 billion for the year ended August 27, 2022, a 7.4% increase from fiscal 2022. This growth was driven primarily by a domestic same store sales increase of 3.4% and net sales of $327.8 million from new domestic and international stores.',
    "What drove the 7.4% increase in AutoZone's net sales for fiscal 2023 compared to fiscal 2022?",
    "What percentage of HP's external U.S. hires in fiscal year 2023 were racially or ethnically diverse?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6986
cosine_accuracy@3 0.8271
cosine_accuracy@5 0.8629
cosine_accuracy@10 0.8986
cosine_precision@1 0.6986
cosine_precision@3 0.2757
cosine_precision@5 0.1726
cosine_precision@10 0.0899
cosine_recall@1 0.6986
cosine_recall@3 0.8271
cosine_recall@5 0.8629
cosine_recall@10 0.8986
cosine_ndcg@10 0.8024
cosine_mrr@10 0.7713
cosine_map@100 0.7759

Information Retrieval

Metric Value
cosine_accuracy@1 0.69
cosine_accuracy@3 0.8271
cosine_accuracy@5 0.86
cosine_accuracy@10 0.9029
cosine_precision@1 0.69
cosine_precision@3 0.2757
cosine_precision@5 0.172
cosine_precision@10 0.0903
cosine_recall@1 0.69
cosine_recall@3 0.8271
cosine_recall@5 0.86
cosine_recall@10 0.9029
cosine_ndcg@10 0.7999
cosine_mrr@10 0.7666
cosine_map@100 0.7707

Information Retrieval

Metric Value
cosine_accuracy@1 0.6957
cosine_accuracy@3 0.8229
cosine_accuracy@5 0.86
cosine_accuracy@10 0.8914
cosine_precision@1 0.6957
cosine_precision@3 0.2743
cosine_precision@5 0.172
cosine_precision@10 0.0891
cosine_recall@1 0.6957
cosine_recall@3 0.8229
cosine_recall@5 0.86
cosine_recall@10 0.8914
cosine_ndcg@10 0.7975
cosine_mrr@10 0.767
cosine_map@100 0.7718

Information Retrieval

Metric Value
cosine_accuracy@1 0.6871
cosine_accuracy@3 0.8129
cosine_accuracy@5 0.8457
cosine_accuracy@10 0.8857
cosine_precision@1 0.6871
cosine_precision@3 0.271
cosine_precision@5 0.1691
cosine_precision@10 0.0886
cosine_recall@1 0.6871
cosine_recall@3 0.8129
cosine_recall@5 0.8457
cosine_recall@10 0.8857
cosine_ndcg@10 0.7877
cosine_mrr@10 0.7562
cosine_map@100 0.761

Information Retrieval

Metric Value
cosine_accuracy@1 0.6329
cosine_accuracy@3 0.7771
cosine_accuracy@5 0.8171
cosine_accuracy@10 0.8571
cosine_precision@1 0.6329
cosine_precision@3 0.259
cosine_precision@5 0.1634
cosine_precision@10 0.0857
cosine_recall@1 0.6329
cosine_recall@3 0.7771
cosine_recall@5 0.8171
cosine_recall@10 0.8571
cosine_ndcg@10 0.7483
cosine_mrr@10 0.7131
cosine_map@100 0.719

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,300 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 2 tokens
    • mean: 46.19 tokens
    • max: 371 tokens
    • min: 2 tokens
    • mean: 20.39 tokens
    • max: 46 tokens
  • Samples:
    positive anchor
    Cash used in financing activities in fiscal 2022 was primarily attributable to settlement of stock-based awards. Why was there a net outflow of cash in financing activities in fiscal 2022?
    Certain vendors have been impacted by volatility in the supply chain financing market. How have certain vendors been impacted in the supply chain financing market?
    In the consolidated financial statements for Visa, the net cash provided by operating activities amounted to 20,755 units in the most recent period, 18,849 units in the previous period, and 15,227 units in the period before that. How much net cash did Visa's operating activities generate in the most recent period according to the financial statements?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.8122 10 1.5643 - - - - -
0.9746 12 - 0.7349 0.7494 0.7524 0.6987 0.7569
1.6244 20 0.6756 - - - - -
1.9492 24 - 0.7555 0.7659 0.7683 0.7190 0.7700
2.4365 30 0.4561 - - - - -
2.9239 36 - 0.7592 0.7698 0.7698 0.7184 0.7741
3.2487 40 0.3645 - - - - -
3.8985 48 - 0.7610 0.7718 0.7707 0.7190 0.7759

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.2.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
10
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for NickyNicky/bge-base-financial-matryoshka_test_0

Finetuned
(256)
this model

Evaluation results