Edit model card

SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("smokxy/bge_small_method1_triplet")
# Run inference
sentences = [
    'How many goats are mentioned?',
    "'| Name           | No                        | Small             | Big    |\\n|----------------|---------------------------|-------------------|--------|\\n| Bulls          |                           |                   |        |\\n| Cows           |                           |                   |        |\\n| Buffaloes      |                           |                   |        |\\n| Sheep          |                           |                   |        |\\n| Goat           |                           |                   |        |\\n| Poultry        |                           |                   |        |\\n|                |                           |                   |        |\\n| Scarcity:      |                           |                   |        |\\n| Item           | How many months in a year | Purchasing? (Y/N) |        |\\n| Food grains    |                           |                   |        |\\n| Wage  work     |                           |                   |        |\\n| Fodder         |                           |                   |        |\\n| Firewood       |                           |                   |        |\\n| others         |                           |                   |        |\\n|                |                           |                   |        |\\n| Assets:        |                           |                   |        |\\n| Asset          | Type of Asset             | Value (Rs)        |        |\\n| House          | Kutcha / Pacca            |                   |        |\\n|                |                           | Drinking water    |        |\\n| source         |                           |                   |        |\\n|                |                           | Bicycle / Two     |        |\\n| Wheeler / Four |                           |                   |        |\\n| wheeler        |                           |                   |        |\\n| Refrigerator   |                           |                   |        |\\n| TV             |                           |                   |        |\\n| Others         |                           |                   |        |\\n|                |                           |                   |        |\\n|                |                           |                   |        |'",
    "'Livestock /Poultry: Under present weather condition, keep animals under shade or in sheds during noon hours, provide plenty of cool water mixed with minerals for drinking and shower the animals with cold water two to three times in a day. Do Vaccination for Haemorrhageic septicemia (H.S.) disease and Blue Quarter (B.Q.) in animals. Udder of milking animals must be properly cleaned with zinc oxide or boric powder. Also give deworming tablet to younger animals. For control of ticks and mites spray Deltamethrin or Amitral 2 ml/liter of water. Spray sanitizers or phenyl in the animal shed to avoid flies and mosquitoes. Give stored fodder with mineral mixture.'",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 35,015 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 7 tokens
    • mean: 15.71 tokens
    • max: 53 tokens
    • min: 116 tokens
    • mean: 279.16 tokens
    • max: 512 tokens
    • min: 116 tokens
    • mean: 316.8 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    Which organizations were visited during the study? ' S No
    What are the benefits of the model for the State Government? '1.16 The model of the PO with federations and farmer clubs at the ground level with Kiosks/outlets for inputs and services can be considered as sustainable for the following reasons: a. It is a farmer organization from top to bottom, providing essential services to the farming community. Hence, the acceptance level in the farming community is high. b. With renewed focus on financial inclusion, farmer clubs can act as BCs of banks, facilitated through their federations. This will ensure that farmers get both banking services and agricultural inputs and related information at one place. c. Production, processing and consumption of certified seed is carried out to meet the requirement of the farmers, creating readymade market locally. The benefits to the farmers include timely availability of high quality seed at a reasonable cost. d. The State Government is channelizing its payments/subsidies under 7-8 schemes through these kiosks making them more popular among the farmers. e. The operating margin at the PO level is quite satisfactory. In its first year, it has surpassed the break-even level, even earning a little surplus. The State Government supported the farmers for cultivating seed and the PO for processing the seed. The demand for quality seed is more than what the farmers could produce at present. Therefore, there is scope for growth. There is potential for the activity to become viable, even without subsidy. f. For the State Government, there is a farmer organization which is able to take care of the input needs of the farmers. State Government can converge some of its agricultural schemes through the PO/ farmer federation and reach the unreached. Agricultural extension services can be provided using this institutional arrangement at a lower cost to a large number of farmers.' 'In order to promote the forest and minor forest produce by the tribal communities, intensive efforts will be made by the implementing agencies to prioritize formation and promotion of FPOs in the notified tribal areas in the country. The benefits of quality input, technology, credit and value addition and processing as well as better market access should reach the tribal community and North-East Region through the Scheme in co-operation with Tribal Affairs Ministry, DONER and North Eastern Council (NEC). 4.9 Existing FPOs will also be allowed to avail relevant benefits, if not earlier availed in any scheme of Government of India, such as Credit Guarantee Fund and advisory services from National Project Management Agency (NPMA) under the Scheme. The FPOs which are already registered but have not been provided funds under any other schemes and have not yet started operation will also be covered under the Scheme.'
    What is the purpose of the Kisan Credit Card (KCC) scheme? 'The Kisan Credit Card (KCC) scheme was introduced in 1998 for issue of Kisan Credit Cards to farmers on the basis of their holdings for uniform adoption by the banks so that farmers may use them to readily purchase agriculture inputs such as seeds, fertilizers, pesticides etc. and draw cash for their production needs. The scheme was further extended for the investment credit requirement of farmers viz. allied and non-farm activities in the year 2004. The scheme was further revisited in 2012 by a working Group under the Chairmanship of Shri T. M. Bhasin, CMD, Indian Bank with a view to simplify the scheme and facilitate issue of Electronic Kisan Credit Cards. The scheme provides broad guidelines to banks for operationalizing the KCC scheme. Implementing banks will have the discretion to adopt the same to suit institution/location specific requirements.' 'First installment due on (date) : ii). Last Installment due on (date) : 6. b). Cash Credit : Limit: Drawing Power: Outstanding: Comments on Irregularity ( if any): Any adverse comments on the unit by inspecting official in last inspection report: 7. A. Cost of Project (as accepted by sanctioning authority)(In Rs. Lakh) B. Means of Finance (as accepted by sanctioning authority)(In Rs. Lakh) Give component wise details a. Term loan of Bank: b. Promoter Equity c. Unsecured loan : d. Others if any Total Total 8. A. Forward Linkages: B. Backward Linkages with Small/Marginal farmers: 1 No. of members: 2 Details of Primary and Collateral Securities taken by the bank (if any) 3 a. Primary Securities b. Collateral Securities 4 5 6 (Please enclose details separately) 9 NameoftheConsortium(ifany)associatedwithCreditFacilitywithcompleteaddress,contac t details and email: 9 a) Address (*with pin-code) : 9 b) Contact Details : 9 c) Email Address : Request of Branch head for Credit Guarantee:- In view of the above information, we request Credit Guarantee Cover against Credit Facility of Rs.....................(in Rupees ) to FPO(copy of sanction letter along with appraisal/process note of competent authority is enclosed for your perusal and record ). Further we confirm that : 1. The KYC norms in respect of the Promoters have been complied by us. 2. Techno-feasibility and economic viability aspect of the project has been taken care of by the sanctioning authority and the branch. 3. On quarterly basis, bank will apprise the ........................(Name of Implementing Agency)about progress of unit, recovery of bank's dues and present status of account to........................(Name of Implementing Agency) 4. We undertake to abide by the Terms & Conditions of the Scheme.'
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.2284 500 4.7424
0.4568 1000 4.5923
0.6852 1500 4.5216
0.9137 2000 4.4782
1.1421 2500 4.4073
1.3705 3000 4.3671
1.5989 3500 4.3421
1.8273 4000 4.3207
2.0557 4500 4.3103
2.2841 5000 4.2805
2.5126 5500 4.2757
2.7410 6000 4.2483
2.9694 6500 4.273

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.1.2
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
11
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for smokxy/bge_small_method1_triplet

Finetuned
(107)
this model