SentenceTransformer based on BAAI/bge-small-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-small-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("smokxy/bge_small_method1_triplet")
# Run inference
sentences = [
'How many goats are mentioned?',
"'| Name | No | Small | Big |\\n|----------------|---------------------------|-------------------|--------|\\n| Bulls | | | |\\n| Cows | | | |\\n| Buffaloes | | | |\\n| Sheep | | | |\\n| Goat | | | |\\n| Poultry | | | |\\n| | | | |\\n| Scarcity: | | | |\\n| Item | How many months in a year | Purchasing? (Y/N) | |\\n| Food grains | | | |\\n| Wage work | | | |\\n| Fodder | | | |\\n| Firewood | | | |\\n| others | | | |\\n| | | | |\\n| Assets: | | | |\\n| Asset | Type of Asset | Value (Rs) | |\\n| House | Kutcha / Pacca | | |\\n| | | Drinking water | |\\n| source | | | |\\n| | | Bicycle / Two | |\\n| Wheeler / Four | | | |\\n| wheeler | | | |\\n| Refrigerator | | | |\\n| TV | | | |\\n| Others | | | |\\n| | | | |\\n| | | | |'",
"'Livestock /Poultry: Under present weather condition, keep animals under shade or in sheds during noon hours, provide plenty of cool water mixed with minerals for drinking and shower the animals with cold water two to three times in a day. Do Vaccination for Haemorrhageic septicemia (H.S.) disease and Blue Quarter (B.Q.) in animals. Udder of milking animals must be properly cleaned with zinc oxide or boric powder. Also give deworming tablet to younger animals. For control of ticks and mites spray Deltamethrin or Amitral 2 ml/liter of water. Spray sanitizers or phenyl in the animal shed to avoid flies and mosquitoes. Give stored fodder with mineral mixture.'",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 35,015 training samples
- Columns:
sentence_0
,sentence_1
, andsentence_2
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 sentence_2 type string string string details - min: 7 tokens
- mean: 15.71 tokens
- max: 53 tokens
- min: 116 tokens
- mean: 279.16 tokens
- max: 512 tokens
- min: 116 tokens
- mean: 316.8 tokens
- max: 512 tokens
- Samples:
sentence_0 sentence_1 sentence_2 Which organizations were visited during the study?
'
S No What are the benefits of the model for the State Government?
'1.16 The model of the PO with federations and farmer clubs at the ground level with Kiosks/outlets for inputs and services can be considered as sustainable for the following reasons: a. It is a farmer organization from top to bottom, providing essential services to the farming community. Hence, the acceptance level in the farming community is high. b. With renewed focus on financial inclusion, farmer clubs can act as BCs of banks, facilitated through their federations. This will ensure that farmers get both banking services and agricultural inputs and related information at one place. c. Production, processing and consumption of certified seed is carried out to meet the requirement of the farmers, creating readymade market locally. The benefits to the farmers include timely availability of high quality seed at a reasonable cost. d. The State Government is channelizing its payments/subsidies under 7-8 schemes through these kiosks making them more popular among the farmers. e. The operating margin at the PO level is quite satisfactory. In its first year, it has surpassed the break-even level, even earning a little surplus. The State Government supported the farmers for cultivating seed and the PO for processing the seed. The demand for quality seed is more than what the farmers could produce at present. Therefore, there is scope for growth. There is potential for the activity to become viable, even without subsidy. f. For the State Government, there is a farmer organization which is able to take care of the input needs of the farmers. State Government can converge some of its agricultural schemes through the PO/ farmer federation and reach the unreached. Agricultural extension services can be provided using this institutional arrangement at a lower cost to a large number of farmers.'
'In order to promote the forest and minor forest produce by the tribal communities, intensive efforts will be made by the implementing agencies to prioritize formation and promotion of FPOs in the notified tribal areas in the country. The benefits of quality input, technology, credit and value addition and processing as well as better market access should reach the tribal community and North-East Region through the Scheme in co-operation with Tribal Affairs Ministry, DONER and North Eastern Council (NEC). 4.9 Existing FPOs will also be allowed to avail relevant benefits, if not earlier availed in any scheme of Government of India, such as Credit Guarantee Fund and advisory services from National Project Management Agency (NPMA) under the Scheme. The FPOs which are already registered but have not been provided funds under any other schemes and have not yet started operation will also be covered under the Scheme.'
What is the purpose of the Kisan Credit Card (KCC) scheme?
'The Kisan Credit Card (KCC) scheme was introduced in 1998 for issue of Kisan Credit Cards to farmers on the basis of their holdings for uniform adoption by the banks so that farmers may use them to readily purchase agriculture inputs such as seeds, fertilizers, pesticides etc. and draw cash for their production needs. The scheme was further extended for the investment credit requirement of farmers viz. allied and non-farm activities in the year 2004. The scheme was further revisited in 2012 by a working Group under the Chairmanship of Shri T. M. Bhasin, CMD, Indian Bank with a view to simplify the scheme and facilitate issue of Electronic Kisan Credit Cards. The scheme provides broad guidelines to banks for operationalizing the KCC scheme. Implementing banks will have the discretion to adopt the same to suit institution/location specific requirements.'
'First installment due on (date) : ii). Last Installment due on (date) : 6. b). Cash Credit : Limit: Drawing Power: Outstanding: Comments on Irregularity ( if any): Any adverse comments on the unit by inspecting official in last inspection report: 7. A. Cost of Project (as accepted by sanctioning authority)(In Rs. Lakh) B. Means of Finance (as accepted by sanctioning authority)(In Rs. Lakh) Give component wise details a. Term loan of Bank: b. Promoter Equity c. Unsecured loan : d. Others if any Total Total 8. A. Forward Linkages: B. Backward Linkages with Small/Marginal farmers: 1 No. of members: 2 Details of Primary and Collateral Securities taken by the bank (if any) 3 a. Primary Securities b. Collateral Securities 4 5 6 (Please enclose details separately) 9 NameoftheConsortium(ifany)associatedwithCreditFacilitywithcompleteaddress,contac t details and email: 9 a) Address (*with pin-code) : 9 b) Contact Details : 9 c) Email Address : Request of Branch head for Credit Guarantee:- In view of the above information, we request Credit Guarantee Cover against Credit Facility of Rs.....................(in Rupees ) to FPO(copy of sanction letter along with appraisal/process note of competent authority is enclosed for your perusal and record ). Further we confirm that : 1. The KYC norms in respect of the Promoters have been complied by us. 2. Techno-feasibility and economic viability aspect of the project has been taken care of by the sanctioning authority and the branch. 3. On quarterly basis, bank will apprise the ........................(Name of Implementing Agency)about progress of unit, recovery of bank's dues and present status of account to........................(Name of Implementing Agency) 4. We undertake to abide by the Terms & Conditions of the Scheme.'
- Loss:
TripletLoss
with these parameters:{ "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 16per_device_eval_batch_size
: 16fp16
: Truemulti_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.2284 | 500 | 4.7424 |
0.4568 | 1000 | 4.5923 |
0.6852 | 1500 | 4.5216 |
0.9137 | 2000 | 4.4782 |
1.1421 | 2500 | 4.4073 |
1.3705 | 3000 | 4.3671 |
1.5989 | 3500 | 4.3421 |
1.8273 | 4000 | 4.3207 |
2.0557 | 4500 | 4.3103 |
2.2841 | 5000 | 4.2805 |
2.5126 | 5500 | 4.2757 |
2.7410 | 6000 | 4.2483 |
2.9694 | 6500 | 4.273 |
Framework Versions
- Python: 3.10.13
- Sentence Transformers: 3.0.1
- Transformers: 4.42.3
- PyTorch: 2.1.2
- Accelerate: 0.32.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
TripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for smokxy/bge_small_method1_triplet
Base model
BAAI/bge-small-en-v1.5