metadata

base_model: Snowflake/snowflake-arctic-embed-l
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@3
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@3
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@3
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@10
  - dot_mrr@10
  - dot_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:3430
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      What are some illustrative cases that show the implementation of the AI
      Bill of Rights?
    sentences:
      - >-
        SECTION TITLE

        APPENDIX

        Listening to the American People 

        The White House Office of Science and Technology Policy (OSTP) led a
        yearlong process to seek and distill 

        input from people across the country – from impacted communities to
        industry stakeholders to 

        technology developers to other experts across fields and sectors, as
        well as policymakers across the Federal 

        government – on the issue of algorithmic and data-driven harms and
        potential remedies. Through panel 

        discussions, public listening sessions, private meetings, a formal
        request for information, and input to a 

        publicly accessible and widely-publicized email address, people across
        the United States spoke up about 

        both the promises and potential harms of these technologies, and played
        a central role in shaping the 

        Blueprint for an AI Bill of Rights. 

        Panel Discussions to Inform the Blueprint for An AI Bill of Rights 

        OSTP co-hosted a series of six panel discussions in collaboration with
        the Center for American Progress,
      - >-
        existing human performance considered as a performance baseline for the
        algorithm to meet pre-deployment, 

        and as a lifecycle minimum performance standard. Decision possibilities
        resulting from performance testing 

        should include the possibility of not deploying the system. 

        Risk identification and mitigation. Before deployment, and in a
        proactive and ongoing manner, poten

        tial risks of the automated system should be identified and mitigated.
        Identified risks should focus on the 

        potential for meaningful impact on people’s rights, opportunities, or
        access and include those to impacted 

        communities that may not be direct users of the automated system, risks
        resulting from purposeful misuse of 

        the system, and other concerns identified via the consultation process.
        Assessment and, where possible, mea

        surement of the impact of risks should be included and balanced such
        that high impact risks receive attention
      - >-
        confidence that their rights, opportunities, and access as well as their
        expectations about technologies are respected. 

        3

        HOW THESE PRINCIPLES CAN MOVE INTO PRACTICE: 

        This section provides real-life examples of how these guiding principles
        can become reality, through laws, policies, and practices. 

        It describes practical technical and sociotechnical approaches to
        protecting rights, opportunities, and access. 

        The examples provided are not critiques or endorsements, but rather are
        offered as illustrative cases to help 

        provide a concrete vision for actualizing the Blueprint for an AI Bill
        of Rights. Effectively implementing these 

        processes require the cooperation of and collaboration among industry,
        civil society, researchers, policymakers, 

        technologists, and the public. 

        14
  - source_sentence: What are the potential impacts of automated systems on data privacy?
    sentences:
      - >-
        https://arxiv.org/pdf/2305.17493v2 

        Smith, A. et al. (2023) Hallucination or Confabulation? Neuroanatomy as
        metaphor in Large Language 

        Models. PLOS Digital Health. 

        https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000388 

        Soice, E. et al. (2023) Can large language models democratize access to
        dual-use biotechnology? arXiv. 

        https://arxiv.org/abs/2306.03809 

        Solaiman, I. et al. (2023) The Gradient of Generative AI Release:
        Methods and Considerations. arXiv. 

        https://arxiv.org/abs/2302.04844 

        Staab, R. et al. (2023) Beyond Memorization: Violating Privacy via
        Inference With Large Language 

        Models. arXiv. https://arxiv.org/pdf/2310.07298 

        Stanford, S. et al. (2023) Whose Opinions Do Language Models Reﬂect?
        arXiv. 

        https://arxiv.org/pdf/2303.17548 

        Strubell, E. et al. (2019) Energy and Policy Considerations for Deep
        Learning in NLP. arXiv. 

        https://arxiv.org/pdf/1906.02243 

        The White House (2016) Circular No. A-130, Managing Information as a
        Strategic Resource.
      - >-
        and data that are considered sensitive are understood to change over
        time based on societal norms and context. 

        36
      - |-
        yet foreseeable, uses or impacts of automated systems. You should be 
        protected from inappropriate or irrelevant data use in the design, de
        velopment, and deployment of automated systems, and from the 
        compounded harm of its reuse. Independent evaluation and report
        ing that confirms that the system is safe and effective, including re
        porting of steps taken to mitigate potential harms, should be per
        formed and the results made public whenever possible. 
        15
  - source_sentence: What is the AI Bill of Rights?
    sentences:
      - |-
        BLUEPRINT FOR AN 
        AI BILL OF 
        RIGHTS 
        MAKING AUTOMATED 
        SYSTEMS WORK FOR 
        THE AMERICAN PEOPLE 
        OCTOBER 2022
      - >-
        APPENDIX

        •

        Julia Simon-Mishel, Supervising Attorney, Philadelphia Legal Assistance

        •

        Dr. Zachary Mahafza, Research & Data Analyst, Southern Poverty Law
        Center

        •

        J. Khadijah Abdurahman, Tech Impact Network Research Fellow, AI Now
        Institute, UCLA C2I1, and

        UWA Law School

        Panelists separately described the increasing scope of technology use in
        providing for social welfare, including 

        in fraud detection, digital ID systems, and other methods focused on
        improving efficiency and reducing cost. 

        However, various panelists individually cautioned that these systems may
        reduce burden for government 

        agencies by increasing the burden and agency of people using and
        interacting with these technologies. 

        Additionally, these systems can produce feedback loops and compounded
        harm, collecting data from 

        communities and using it to reinforce inequality. Various panelists
        suggested that these harms could be 

        mitigated by ensuring community input at the beginning of the design
        process, providing ways to opt out of
      - >-
        safe, secure, and resilient; (e) understandable; (f ) responsible and
        traceable; (g) regularly monitored; (h) transpar-

        ent; and, (i) accountable. The Blueprint for an AI Bill of Rights is
        consistent with the Executive Order. 

        Affected agencies across the federal government have released AI use
        case inventories13 and are implementing 

        plans to bring those AI systems into compliance with the Executive Order
        or retire them. 

        The law and policy landscape for motor vehicles shows that strong safety
        regulations—and 

        measures to address harms when they occur—can enhance innovation in the
        context of com-

        plex technologies. Cars, like automated digital systems, comprise a
        complex collection of components. 

        The National Highway Traffic Safety Administration,14 through its
        rigorous standards and independent 

        evaluation, helps make sure vehicles on our roads are safe without
        limiting manufacturers’ ability to 

        innovate.15 At the same time, rules of the road are implemented locally
        to impose contextually appropriate
  - source_sentence: >-
      What are the best practices for benchmarking AI system security and
      resilience?
    sentences:
      - >-
        NOTICE & 

        EXPLANATION 

        WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS

        The expectations for automated systems are meant to serve as a blueprint
        for the development of additional 

        technical standards and practices that are tailored for particular
        sectors and contexts. 

        An automated system should provide demonstrably clear, timely,
        understandable, and accessible notice of use, and 

        explanations as to how and why a decision was made or an action was
        taken by the system. These expectations are 

        explained below. 

        Provide clear, timely, understandable, and accessible notice of use and
        explanations 

        Generally accessible plain language documentation. The entity
        responsible for using the automated 

        system should ensure that documentation describing the overall system
        (including any human components) is 

        public and easy to find. The documentation should describe, in plain
        language, how the system works and how
      - >-
        content performance and impact, and work in collaboration with AI
        Actors 

        experienced in user research and experience. 

        Human-AI Conﬁguration 

        MG-4.1-004 Implement active learning techniques to identify instances
        where the model fails 

        or produces unexpected outputs. 

        Confabulation 

        MG-4.1-005 

        Share transparency reports with internal and external stakeholders that
        detail 

        steps taken to update the GAI system to enhance transparency and 

        accountability. 

        Human-AI Conﬁguration; Harmful 

        Bias and Homogenization 

        MG-4.1-006 

        Track dataset modiﬁcations for provenance by monitoring data deletions, 

        rectiﬁcation requests, and other changes that may impact the
        veriﬁability of 

        content origins. 

        Information Integrity
      - >-
        33 

        MEASURE 2.7: AI system security and resilience – as identiﬁed in the MAP
        function – are evaluated and documented. 

        Action ID 

        Suggested Action 

        GAI Risks 

        MS-2.7-001 

        Apply established security measures to: Assess likelihood and magnitude
        of 

        vulnerabilities and threats such as backdoors, compromised dependencies,
        data 

        breaches, eavesdropping, man-in-the-middle attacks, reverse
        engineering, 

        autonomous agents, model theft or exposure of model weights, AI
        inference, 

        bypass, extraction, and other baseline security concerns. 

        Data Privacy; Information Integrity; 

        Information Security; Value Chain 

        and Component Integration 

        MS-2.7-002 

        Benchmark GAI system security and resilience related to content
        provenance 

        against industry standards and best practices. Compare GAI system
        security 

        features and content provenance methods against industry
        state-of-the-art. 

        Information Integrity; Information 

        Security 

        MS-2.7-003 

        Conduct user surveys to gather user satisfaction with the AI-generated
        content
  - source_sentence: >-
      How should risks or trustworthiness characteristics that cannot be
      measured be documented?
    sentences:
      - >-
        MEASURE 1.1: Approaches and metrics for measurement of AI risks
        enumerated during the MAP function are selected for 

        implementation starting with the most signiﬁcant AI risks. The risks or
        trustworthiness characteristics that will not – or cannot – be 

        measured are properly documented. 

        Action ID 

        Suggested Action 

        GAI Risks 

        MS-1.1-001 Employ methods to trace the origin and modiﬁcations of
        digital content. 

        Information Integrity 

        MS-1.1-002 

        Integrate tools designed to analyze content provenance and detect data 

        anomalies, verify the authenticity of digital signatures, and identify
        patterns 

        associated with misinformation or manipulation. 

        Information Integrity 

        MS-1.1-003 

        Disaggregate evaluation metrics by demographic factors to identify any 

        discrepancies in how content provenance mechanisms work across diverse 

        populations. 

        Information Integrity; Harmful 

        Bias and Homogenization 

        MS-1.1-004 Develop a suite of metrics to evaluate structured public
        feedback exercises
      - >-
        AI technology can produce varied outputs in multiple modalities and
        present many classes of user 

        interfaces. This leads to a broader set of AI Actors interacting with
        GAI systems for widely diﬀering 

        applications and contexts of use. These can include data labeling and
        preparation, development of GAI 

        models, content moderation, code generation and review, text generation
        and editing, image and video 

        generation, summarization, search, and chat. These activities can take
        place within organizational 

        settings or in the public domain. 

        Organizations can restrict AI applications that cause harm, exceed
        stated risk tolerances, or that conﬂict 

        with their tolerances or values. Governance tools and protocols that are
        applied to other types of AI 

        systems can be applied to GAI systems. These plans and actions include: 

        • Accessibility and reasonable 

        accommodations 

        • AI actor credentials and qualiﬁcations  

        • Alignment to organizational values 

        • Auditing and assessment 

        • Change-management controls
      - >-
        existing human performance considered as a performance baseline for the
        algorithm to meet pre-deployment, 

        and as a lifecycle minimum performance standard. Decision possibilities
        resulting from performance testing 

        should include the possibility of not deploying the system. 

        Risk identification and mitigation. Before deployment, and in a
        proactive and ongoing manner, poten

        tial risks of the automated system should be identified and mitigated.
        Identified risks should focus on the 

        potential for meaningful impact on people’s rights, opportunities, or
        access and include those to impacted 

        communities that may not be direct users of the automated system, risks
        resulting from purposeful misuse of 

        the system, and other concerns identified via the consultation process.
        Assessment and, where possible, mea

        surement of the impact of risks should be included and balanced such
        that high impact risks receive attention
model-index:
  - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.2807017543859649
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4649122807017544
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5350877192982456
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.7192982456140351
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.2807017543859649
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.15497076023391812
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.10701754385964912
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0719298245614035
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.2807017543859649
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4649122807017544
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5350877192982456
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.7192982456140351
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.4797086283187805
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.40644667223614606
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.423567506926962
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.2807017543859649
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.4649122807017544
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.5350877192982456
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.7192982456140351
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.2807017543859649
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.15497076023391812
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.10701754385964912
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.0719298245614035
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.2807017543859649
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.4649122807017544
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.5350877192982456
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.7192982456140351
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.4797086283187805
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.40644667223614606
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.423567506926962
            name: Dot Map@100

SentenceTransformer based on Snowflake/snowflake-arctic-embed-l

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Snowflake/snowflake-arctic-embed-l
Maximum Sequence Length: 512 tokens
Output Dimensionality: 1024 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("jeevanions/finetuned_arctic-embedd-l")
# Run inference
sentences = [
    'How should risks or trustworthiness characteristics that cannot be measured be documented?',
    'MEASURE 1.1: Approaches and metrics for measurement of AI risks enumerated during the MAP function are selected for \nimplementation starting with the most signiﬁcant AI risks. The risks or trustworthiness characteristics that will not – or cannot – be \nmeasured are properly documented. \nAction ID \nSuggested Action \nGAI Risks \nMS-1.1-001 Employ methods to trace the origin and modiﬁcations of digital content. \nInformation Integrity \nMS-1.1-002 \nIntegrate tools designed to analyze content provenance and detect data \nanomalies, verify the authenticity of digital signatures, and identify patterns \nassociated with misinformation or manipulation. \nInformation Integrity \nMS-1.1-003 \nDisaggregate evaluation metrics by demographic factors to identify any \ndiscrepancies in how content provenance mechanisms work across diverse \npopulations. \nInformation Integrity; Harmful \nBias and Homogenization \nMS-1.1-004 Develop a suite of metrics to evaluate structured public feedback exercises',
    'existing human performance considered as a performance baseline for the algorithm to meet pre-deployment, \nand as a lifecycle minimum performance standard. Decision possibilities resulting from performance testing \nshould include the possibility of not deploying the system. \nRisk identification and mitigation. Before deployment, and in a proactive and ongoing manner, poten\xad\ntial risks of the automated system should be identified and mitigated. Identified risks should focus on the \npotential for meaningful impact on people’s rights, opportunities, or access and include those to impacted \ncommunities that may not be direct users of the automated system, risks resulting from purposeful misuse of \nthe system, and other concerns identified via the consultation process. Assessment and, where possible, mea\xad\nsurement of the impact of risks should be included and balanced such that high impact risks receive attention',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.2807
cosine_accuracy@3	0.4649
cosine_accuracy@5	0.5351
cosine_accuracy@10	0.7193
cosine_precision@1	0.2807
cosine_precision@3	0.155
cosine_precision@5	0.107
cosine_precision@10	0.0719
cosine_recall@1	0.2807
cosine_recall@3	0.4649
cosine_recall@5	0.5351
cosine_recall@10	0.7193
cosine_ndcg@10	0.4797
cosine_mrr@10	0.4064
cosine_map@100	0.4236
dot_accuracy@1	0.2807
dot_accuracy@3	0.4649
dot_accuracy@5	0.5351
dot_accuracy@10	0.7193
dot_precision@1	0.2807
dot_precision@3	0.155
dot_precision@5	0.107
dot_precision@10	0.0719
dot_recall@1	0.2807
dot_recall@3	0.4649
dot_recall@5	0.5351
dot_recall@10	0.7193
dot_ndcg@10	0.4797
dot_mrr@10	0.4064
dot_map@100	0.4236

Training Details

Training Dataset

Unnamed Dataset

Size: 3,430 training samples
Columns: sentence_0 and sentence_1
Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1
type string string
details
min: 8 tokens
mean: 17.71 tokens
max: 36 tokens

min: 7 tokens
mean: 172.72 tokens
max: 356 tokens

	sentence_0	sentence_1
type	string	string
details	min: 8 tokens mean: 17.71 tokens max: 36 tokens	min: 7 tokens mean: 172.72 tokens max: 356 tokens

Samples:

sentence_0	sentence_1
`What are the key steps to obtain input from stakeholder communities to identify unacceptable use in AI systems?`	15 GV-1.3-004 Obtain input from stakeholder communities to identify unacceptable use, in accordance with activities in the AI RMF Map function. CBRN Information or Capabilities; Obscene, Degrading, and/or Abusive Content; Harmful Bias and Homogenization; Dangerous, Violent, or Hateful Content GV-1.3-005 Maintain an updated hierarchy of identiﬁed and expected GAI risks connected to contexts of GAI model advancement and use, potentially including specialized risk levels for GAI systems that address issues such as model collapse and algorithmic monoculture. Harmful Bias and Homogenization GV-1.3-006 Reevaluate organizational risk tolerances to account for unacceptable negative risk (such as where signiﬁcant negative impacts are imminent, severe harms are actually occurring, or large-scale risks could occur); and broad GAI negative risks, including: Immature safety or risk cultures related to AI and GAI design, development and deployment, public information integrity risks, including impacts
`How can organizations maintain an updated hierarchy of identified and expected GAI risks?`	15 GV-1.3-004 Obtain input from stakeholder communities to identify unacceptable use, in accordance with activities in the AI RMF Map function. CBRN Information or Capabilities; Obscene, Degrading, and/or Abusive Content; Harmful Bias and Homogenization; Dangerous, Violent, or Hateful Content GV-1.3-005 Maintain an updated hierarchy of identiﬁed and expected GAI risks connected to contexts of GAI model advancement and use, potentially including specialized risk levels for GAI systems that address issues such as model collapse and algorithmic monoculture. Harmful Bias and Homogenization GV-1.3-006 Reevaluate organizational risk tolerances to account for unacceptable negative risk (such as where signiﬁcant negative impacts are imminent, severe harms are actually occurring, or large-scale risks could occur); and broad GAI negative risks, including: Immature safety or risk cultures related to AI and GAI design, development and deployment, public information integrity risks, including impacts
`What are some examples of unacceptable uses of AI as identified by stakeholder communities?`	15 GV-1.3-004 Obtain input from stakeholder communities to identify unacceptable use, in accordance with activities in the AI RMF Map function. CBRN Information or Capabilities; Obscene, Degrading, and/or Abusive Content; Harmful Bias and Homogenization; Dangerous, Violent, or Hateful Content GV-1.3-005 Maintain an updated hierarchy of identiﬁed and expected GAI risks connected to contexts of GAI model advancement and use, potentially including specialized risk levels for GAI systems that address issues such as model collapse and algorithmic monoculture. Harmful Bias and Homogenization GV-1.3-006 Reevaluate organizational risk tolerances to account for unacceptable negative risk (such as where signiﬁcant negative impacts are imminent, severe harms are actually occurring, or large-scale risks could occur); and broad GAI negative risks, including: Immature safety or risk cultures related to AI and GAI design, development and deployment, public information integrity risks, including impacts

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
num_train_epochs: 5
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand

Epoch	Step	Training Loss	cosine_map@100
0.0146	50	-	0.4134
0.0292	100	-	0.4134
0.0437	150	-	0.4134
0.0583	200	-	0.4134
0.0729	250	-	0.4134
0.0875	300	-	0.4134
0.1020	350	-	0.4134
0.1166	400	-	0.4134
0.1312	450	-	0.4134
0.1458	500	0.0	0.4134
0.1603	550	-	0.4134
0.1749	600	-	0.4134
0.1895	650	-	0.4134
0.2041	700	-	0.4134
0.2187	750	-	0.4134
0.2332	800	-	0.4134
0.2478	850	-	0.4134
0.2624	900	-	0.4134
0.2770	950	-	0.4134
0.2915	1000	0.0	0.4134
0.3061	1050	-	0.4134
0.3207	1100	-	0.4134
0.3353	1150	-	0.4134
0.3499	1200	-	0.4134
0.3644	1250	-	0.4134
0.3790	1300	-	0.4134
0.3936	1350	-	0.4134
0.4082	1400	-	0.4134
0.4227	1450	-	0.4134
0.4373	1500	0.0	0.4134
0.4519	1550	-	0.4134
0.4665	1600	-	0.4134
0.4810	1650	-	0.4134
0.4956	1700	-	0.4134
0.5102	1750	-	0.4134
0.5248	1800	-	0.4134
0.5394	1850	-	0.4134
0.5539	1900	-	0.4134
0.5685	1950	-	0.4134
0.5831	2000	0.0	0.4135
0.5977	2050	-	0.4135
0.6122	2100	-	0.4135
0.6268	2150	-	0.4135
0.6414	2200	-	0.4135
0.6560	2250	-	0.4135
0.6706	2300	-	0.4135
0.6851	2350	-	0.4135
0.6997	2400	-	0.4135
0.7143	2450	-	0.4134
0.7289	2500	0.0	0.4134
0.7434	2550	-	0.4134
0.7580	2600	-	0.4134
0.7726	2650	-	0.4134
0.7872	2700	-	0.4134
0.8017	2750	-	0.4134
0.8163	2800	-	0.4134
0.8309	2850	-	0.4135
0.8455	2900	-	0.4135
0.8601	2950	-	0.4135
0.8746	3000	0.0	0.4135
0.8892	3050	-	0.4135
0.9038	3100	-	0.4135
0.9184	3150	-	0.4135
0.9329	3200	-	0.4135
0.9475	3250	-	0.4135
0.9621	3300	-	0.4135
0.9767	3350	-	0.4135
0.9913	3400	-	0.4135
1.0	3430	-	0.4135
1.0058	3450	-	0.4135
1.0204	3500	0.0	0.4135
1.0350	3550	-	0.4135
1.0496	3600	-	0.4135
1.0641	3650	-	0.4135
1.0787	3700	-	0.4135
1.0933	3750	-	0.4135
1.1079	3800	-	0.4135
1.1224	3850	-	0.4135
1.1370	3900	-	0.4179
1.1516	3950	-	0.4179
1.1662	4000	0.0	0.4179
1.1808	4050	-	0.4179
1.1953	4100	-	0.4179
1.2099	4150	-	0.4179
1.2245	4200	-	0.4179
1.2391	4250	-	0.4179
1.2536	4300	-	0.4179
1.2682	4350	-	0.4179
1.2828	4400	-	0.4179
1.2974	4450	-	0.4179
1.3120	4500	0.0	0.4179
1.3265	4550	-	0.4179
1.3411	4600	-	0.4179
1.3557	4650	-	0.4179
1.3703	4700	-	0.4179
1.3848	4750	-	0.4179
1.3994	4800	-	0.4179
1.4140	4850	-	0.4179
1.4286	4900	-	0.4179
1.4431	4950	-	0.4179
1.4577	5000	0.0	0.4179
1.4723	5050	-	0.4179
1.4869	5100	-	0.4179
1.5015	5150	-	0.4179
1.5160	5200	-	0.4179
1.5306	5250	-	0.4179
1.5452	5300	-	0.4179
1.5598	5350	-	0.4179
1.5743	5400	-	0.4179
1.5889	5450	-	0.4179
1.6035	5500	0.0	0.4179
1.6181	5550	-	0.4179
1.6327	5600	-	0.4179
1.6472	5650	-	0.4179
1.6618	5700	-	0.4179
1.6764	5750	-	0.4179
1.6910	5800	-	0.4179
1.7055	5850	-	0.4179
1.7201	5900	-	0.4179
1.7347	5950	-	0.4179
1.7493	6000	0.0	0.4179
1.7638	6050	-	0.4179
1.7784	6100	-	0.4179
1.7930	6150	-	0.4179
1.8076	6200	-	0.4179
1.8222	6250	-	0.4179
1.8367	6300	-	0.4179
1.8513	6350	-	0.4179
1.8659	6400	-	0.4179
1.8805	6450	-	0.4179
1.8950	6500	0.0	0.4179
1.9096	6550	-	0.4179
1.9242	6600	-	0.4179
1.9388	6650	-	0.4179
1.9534	6700	-	0.4179
1.9679	6750	-	0.4179
1.9825	6800	-	0.4179
1.9971	6850	-	0.4179
2.0	6860	-	0.4179
2.0117	6900	-	0.4179
2.0262	6950	-	0.4179
2.0408	7000	0.0	0.4179
2.0554	7050	-	0.4179
2.0700	7100	-	0.4179
2.0845	7150	-	0.4179
2.0991	7200	-	0.4179
2.1137	7250	-	0.4179
2.1283	7300	-	0.4179
2.1429	7350	-	0.4179
2.1574	7400	-	0.4179
2.1720	7450	-	0.4179
2.1866	7500	0.0	0.4179
2.2012	7550	-	0.4179
2.2157	7600	-	0.4179
2.2303	7650	-	0.4179
2.2449	7700	-	0.4179
2.2595	7750	-	0.4179
2.2741	7800	-	0.4179
2.2886	7850	-	0.4179
2.3032	7900	-	0.4179
2.3178	7950	-	0.4179
2.3324	8000	0.0	0.4179
2.3469	8050	-	0.4179
2.3615	8100	-	0.4179
2.3761	8150	-	0.4179
2.3907	8200	-	0.4179
2.4052	8250	-	0.4179
2.4198	8300	-	0.4179
2.4344	8350	-	0.4179
2.4490	8400	-	0.4179
2.4636	8450	-	0.4179
2.4781	8500	0.0	0.4179
2.4927	8550	-	0.4179
2.5073	8600	-	0.4179
2.5219	8650	-	0.4179
2.5364	8700	-	0.4179
2.5510	8750	-	0.4179
2.5656	8800	-	0.4179
2.5802	8850	-	0.4179
2.5948	8900	-	0.4179
2.6093	8950	-	0.4179
2.6239	9000	0.0	0.4179
2.6385	9050	-	0.4179
2.6531	9100	-	0.4179
2.6676	9150	-	0.4179
2.6822	9200	-	0.4179
2.6968	9250	-	0.4223
2.7114	9300	-	0.4223
2.7259	9350	-	0.4223
2.7405	9400	-	0.4223
2.7551	9450	-	0.4223
2.7697	9500	0.0	0.4223
2.7843	9550	-	0.4223
2.7988	9600	-	0.4223
2.8134	9650	-	0.4223
2.8280	9700	-	0.4223
2.8426	9750	-	0.4223
2.8571	9800	-	0.4223
2.8717	9850	-	0.4223
2.8863	9900	-	0.4223
2.9009	9950	-	0.4223
2.9155	10000	0.0	0.4223
2.9300	10050	-	0.4223
2.9446	10100	-	0.4223
2.9592	10150	-	0.4223
2.9738	10200	-	0.4223
2.9883	10250	-	0.4223
3.0	10290	-	0.4223
3.0029	10300	-	0.4223
3.0175	10350	-	0.4223
3.0321	10400	-	0.4223
3.0466	10450	-	0.4223
3.0612	10500	0.0	0.4223
3.0758	10550	-	0.4223
3.0904	10600	-	0.4223
3.1050	10650	-	0.4223
3.1195	10700	-	0.4223
3.1341	10750	-	0.4223
3.1487	10800	-	0.4223
3.1633	10850	-	0.4223
3.1778	10900	-	0.4223
3.1924	10950	-	0.4223
3.2070	11000	0.0	0.4223
3.2216	11050	-	0.4223
3.2362	11100	-	0.4223
3.2507	11150	-	0.4223
3.2653	11200	-	0.4223
3.2799	11250	-	0.4223
3.2945	11300	-	0.4223
3.3090	11350	-	0.4223
3.3236	11400	-	0.4223
3.3382	11450	-	0.4223
3.3528	11500	0.0	0.4223
3.3673	11550	-	0.4223
3.3819	11600	-	0.4223
3.3965	11650	-	0.4223
3.4111	11700	-	0.4223
3.4257	11750	-	0.4223
3.4402	11800	-	0.4223
3.4548	11850	-	0.4223
3.4694	11900	-	0.4223
3.4840	11950	-	0.4223
3.4985	12000	0.0	0.4223
3.5131	12050	-	0.4223
3.5277	12100	-	0.4223
3.5423	12150	-	0.4223
3.5569	12200	-	0.4223
3.5714	12250	-	0.4223
3.5860	12300	-	0.4223
3.6006	12350	-	0.4223
3.6152	12400	-	0.4223
3.6297	12450	-	0.4223
3.6443	12500	0.0	0.4223
3.6589	12550	-	0.4223
3.6735	12600	-	0.4223
3.6880	12650	-	0.4223
3.7026	12700	-	0.4223
3.7172	12750	-	0.4223
3.7318	12800	-	0.4223
3.7464	12850	-	0.4223
3.7609	12900	-	0.4223
3.7755	12950	-	0.4223
3.7901	13000	0.0	0.4223
3.8047	13050	-	0.4223
3.8192	13100	-	0.4226
3.8338	13150	-	0.4226
3.8484	13200	-	0.4226
3.8630	13250	-	0.4226
3.8776	13300	-	0.4226
3.8921	13350	-	0.4226
3.9067	13400	-	0.4226
3.9213	13450	-	0.4226
3.9359	13500	0.0	0.4226
3.9504	13550	-	0.4226
3.9650	13600	-	0.4226
3.9796	13650	-	0.4226
3.9942	13700	-	0.4226
4.0	13720	-	0.4226
4.0087	13750	-	0.4226
4.0233	13800	-	0.4226
4.0379	13850	-	0.4226
4.0525	13900	-	0.4226
4.0671	13950	-	0.4226
4.0816	14000	0.0	0.4226
4.0962	14050	-	0.4226
4.1108	14100	-	0.4226
4.1254	14150	-	0.4226
4.1399	14200	-	0.4226
4.1545	14250	-	0.4226
4.1691	14300	-	0.4226
4.1837	14350	-	0.4226
4.1983	14400	-	0.4226
4.2128	14450	-	0.4226
4.2274	14500	0.0	0.4226
4.2420	14550	-	0.4226
4.2566	14600	-	0.4226
4.2711	14650	-	0.4226
4.2857	14700	-	0.4226
4.3003	14750	-	0.4226
4.3149	14800	-	0.4226
4.3294	14850	-	0.4226
4.3440	14900	-	0.4226
4.3586	14950	-	0.4226
4.3732	15000	0.0	0.4226
4.3878	15050	-	0.4226
4.4023	15100	-	0.4226
4.4169	15150	-	0.4226
4.4315	15200	-	0.4226
4.4461	15250	-	0.4226
4.4606	15300	-	0.4226
4.4752	15350	-	0.4226
4.4898	15400	-	0.4226
4.5044	15450	-	0.4226
4.5190	15500	0.0	0.4226
4.5335	15550	-	0.4226
4.5481	15600	-	0.4226
4.5627	15650	-	0.4226
4.5773	15700	-	0.4226
4.5918	15750	-	0.4226
4.6064	15800	-	0.4226
4.6210	15850	-	0.4226
4.6356	15900	-	0.4226
4.6501	15950	-	0.4226
4.6647	16000	0.0	0.4226
4.6793	16050	-	0.4226
4.6939	16100	-	0.4226
4.7085	16150	-	0.4226
4.7230	16200	-	0.4226
4.7376	16250	-	0.4226
4.7522	16300	-	0.4226
4.7668	16350	-	0.4226
4.7813	16400	-	0.4226
4.7959	16450	-	0.4226
4.8105	16500	0.0	0.4226
4.8251	16550	-	0.4226
4.8397	16600	-	0.4226
4.8542	16650	-	0.4226
4.8688	16700	-	0.4226
4.8834	16750	-	0.4226
4.8980	16800	-	0.4226
4.9125	16850	-	0.4226
4.9271	16900	-	0.4226
4.9417	16950	-	0.4226
4.9563	17000	0.0	0.4226
4.9708	17050	-	0.4226
4.9854	17100	-	0.4226
5.0	17150	-	0.4226
0.0146	50	-	0.4226
0.0292	100	-	0.4226
0.0437	150	-	0.4226
0.0583	200	-	0.4226
0.0729	250	-	0.4226
0.0875	300	-	0.4226
0.1020	350	-	0.4226
0.1166	400	-	0.4226
0.1312	450	-	0.4226
0.1458	500	0.0	0.4226
0.1603	550	-	0.4226
0.1749	600	-	0.4226
0.1895	650	-	0.4226
0.2041	700	-	0.4226
0.2187	750	-	0.4226
0.2332	800	-	0.4226
0.2478	850	-	0.4226
0.2624	900	-	0.4226
0.2770	950	-	0.4226
0.2915	1000	0.0	0.4227
0.3061	1050	-	0.4227
0.3207	1100	-	0.4227
0.3353	1150	-	0.4227
0.3499	1200	-	0.4227
0.3644	1250	-	0.4227
0.3790	1300	-	0.4227
0.3936	1350	-	0.4227
0.4082	1400	-	0.4227
0.4227	1450	-	0.4227
0.4373	1500	0.0	0.4227
0.4519	1550	-	0.4227
0.4665	1600	-	0.4227
0.4810	1650	-	0.4227
0.4956	1700	-	0.4227
0.5102	1750	-	0.4227
0.5248	1800	-	0.4227
0.5394	1850	-	0.4227
0.5539	1900	-	0.4227
0.5685	1950	-	0.4227
0.5831	2000	0.0	0.4227
0.5977	2050	-	0.4227
0.6122	2100	-	0.4227
0.6268	2150	-	0.4227
0.6414	2200	-	0.4227
0.6560	2250	-	0.4227
0.6706	2300	-	0.4227
0.6851	2350	-	0.4227
0.6997	2400	-	0.4227
0.7143	2450	-	0.4227
0.7289	2500	0.0	0.4227
0.7434	2550	-	0.4227
0.7580	2600	-	0.4227
0.7726	2650	-	0.4227
0.7872	2700	-	0.4227
0.8017	2750	-	0.4227
0.8163	2800	-	0.4227
0.8309	2850	-	0.4227
0.8455	2900	-	0.4227
0.8601	2950	-	0.4227
0.8746	3000	0.0	0.4227
0.8892	3050	-	0.4227
0.9038	3100	-	0.4227
0.9184	3150	-	0.4227
0.9329	3200	-	0.4227
0.9475	3250	-	0.4227
0.9621	3300	-	0.4227
0.9767	3350	-	0.4227
0.9913	3400	-	0.4227
1.0	3430	-	0.4227
1.0058	3450	-	0.4227
1.0204	3500	0.0	0.4227
1.0350	3550	-	0.4227
1.0496	3600	-	0.4227
1.0641	3650	-	0.4227
1.0787	3700	-	0.4227
1.0933	3750	-	0.4227
1.1079	3800	-	0.4227
1.1224	3850	-	0.4227
1.1370	3900	-	0.4227
1.1516	3950	-	0.4227
1.1662	4000	0.0	0.4227
1.1808	4050	-	0.4227
1.1953	4100	-	0.4227
1.2099	4150	-	0.4231
1.2245	4200	-	0.4231
1.2391	4250	-	0.4231
1.2536	4300	-	0.4231
1.2682	4350	-	0.4231
1.2828	4400	-	0.4231
1.2974	4450	-	0.4231
1.3120	4500	0.0	0.4231
1.3265	4550	-	0.4231
1.3411	4600	-	0.4231
1.3557	4650	-	0.4232
1.3703	4700	-	0.4232
1.3848	4750	-	0.4232
1.3994	4800	-	0.4232
1.4140	4850	-	0.4232
1.4286	4900	-	0.4232
1.4431	4950	-	0.4232
1.4577	5000	0.0	0.4232
1.4723	5050	-	0.4232
1.4869	5100	-	0.4232
1.5015	5150	-	0.4232
1.5160	5200	-	0.4232
1.5306	5250	-	0.4232
1.5452	5300	-	0.4233
1.5598	5350	-	0.4233
1.5743	5400	-	0.4233
1.5889	5450	-	0.4233
1.6035	5500	0.0	0.4233
1.6181	5550	-	0.4233
1.6327	5600	-	0.4233
1.6472	5650	-	0.4233
1.6618	5700	-	0.4233
1.6764	5750	-	0.4233
1.6910	5800	-	0.4233
1.7055	5850	-	0.4233
1.7201	5900	-	0.4233
1.7347	5950	-	0.4233
1.7493	6000	0.0	0.4233
1.7638	6050	-	0.4234
1.7784	6100	-	0.4234
1.7930	6150	-	0.4234
1.8076	6200	-	0.4234
1.8222	6250	-	0.4234
1.8367	6300	-	0.4234
1.8513	6350	-	0.4234
1.8659	6400	-	0.4234
1.8805	6450	-	0.4234
1.8950	6500	0.0	0.4234
1.9096	6550	-	0.4234
1.9242	6600	-	0.4234
1.9388	6650	-	0.4234
1.9534	6700	-	0.4234
1.9679	6750	-	0.4234
1.9825	6800	-	0.4234
1.9971	6850	-	0.4234
2.0	6860	-	0.4234
2.0117	6900	-	0.4234
2.0262	6950	-	0.4234
2.0408	7000	0.0	0.4234
2.0554	7050	-	0.4234
2.0700	7100	-	0.4234
2.0845	7150	-	0.4234
2.0991	7200	-	0.4234
2.1137	7250	-	0.4234
2.1283	7300	-	0.4234
2.1429	7350	-	0.4234
2.1574	7400	-	0.4234
2.1720	7450	-	0.4234
2.1866	7500	0.0	0.4234
2.2012	7550	-	0.4234
2.2157	7600	-	0.4234
2.2303	7650	-	0.4234
2.2449	7700	-	0.4234
2.2595	7750	-	0.4234
2.2741	7800	-	0.4234
2.2886	7850	-	0.4234
2.3032	7900	-	0.4234
2.3178	7950	-	0.4234
2.3324	8000	0.0	0.4234
2.3469	8050	-	0.4234
2.3615	8100	-	0.4234
2.3761	8150	-	0.4234
2.3907	8200	-	0.4234
2.4052	8250	-	0.4234
2.4198	8300	-	0.4234
2.4344	8350	-	0.4234
2.4490	8400	-	0.4234
2.4636	8450	-	0.4234
2.4781	8500	0.0	0.4234
2.4927	8550	-	0.4234
2.5073	8600	-	0.4234
2.5219	8650	-	0.4234
2.5364	8700	-	0.4234
2.5510	8750	-	0.4234
2.5656	8800	-	0.4234
2.5802	8850	-	0.4234
2.5948	8900	-	0.4234
2.6093	8950	-	0.4234
2.6239	9000	0.0	0.4234
2.6385	9050	-	0.4234
2.6531	9100	-	0.4234
2.6676	9150	-	0.4234
2.6822	9200	-	0.4234
2.6968	9250	-	0.4234
2.7114	9300	-	0.4234
2.7259	9350	-	0.4234
2.7405	9400	-	0.4234
2.7551	9450	-	0.4234
2.7697	9500	0.0	0.4234
2.7843	9550	-	0.4234
2.7988	9600	-	0.4234
2.8134	9650	-	0.4234
2.8280	9700	-	0.4234
2.8426	9750	-	0.4234
2.8571	9800	-	0.4234
2.8717	9850	-	0.4234
2.8863	9900	-	0.4234
2.9009	9950	-	0.4234
2.9155	10000	0.0	0.4234
2.9300	10050	-	0.4234
2.9446	10100	-	0.4234
2.9592	10150	-	0.4234
2.9738	10200	-	0.4234
2.9883	10250	-	0.4234
3.0	10290	-	0.4234
3.0029	10300	-	0.4234
3.0175	10350	-	0.4234
3.0321	10400	-	0.4234
3.0466	10450	-	0.4234
3.0612	10500	0.0	0.4234
3.0758	10550	-	0.4234
3.0904	10600	-	0.4234
3.1050	10650	-	0.4234
3.1195	10700	-	0.4234
3.1341	10750	-	0.4234
3.1487	10800	-	0.4234
3.1633	10850	-	0.4234
3.1778	10900	-	0.4234
3.1924	10950	-	0.4234
3.2070	11000	0.0	0.4234
3.2216	11050	-	0.4234
3.2362	11100	-	0.4234
3.2507	11150	-	0.4234
3.2653	11200	-	0.4234
3.2799	11250	-	0.4234
3.2945	11300	-	0.4234
3.3090	11350	-	0.4234
3.3236	11400	-	0.4234
3.3382	11450	-	0.4234
3.3528	11500	0.0	0.4234
3.3673	11550	-	0.4234
3.3819	11600	-	0.4234
3.3965	11650	-	0.4234
3.4111	11700	-	0.4234
3.4257	11750	-	0.4234
3.4402	11800	-	0.4234
3.4548	11850	-	0.4235
3.4694	11900	-	0.4235
3.4840	11950	-	0.4235
3.4985	12000	0.0	0.4235
3.5131	12050	-	0.4235
3.5277	12100	-	0.4235
3.5423	12150	-	0.4235
3.5569	12200	-	0.4235
3.5714	12250	-	0.4235
3.5860	12300	-	0.4235
3.6006	12350	-	0.4235
3.6152	12400	-	0.4235
3.6297	12450	-	0.4235
3.6443	12500	0.0	0.4235
3.6589	12550	-	0.4235
3.6735	12600	-	0.4235
3.6880	12650	-	0.4235
3.7026	12700	-	0.4235
3.7172	12750	-	0.4235
3.7318	12800	-	0.4235
3.7464	12850	-	0.4235
3.7609	12900	-	0.4235
3.7755	12950	-	0.4235
3.7901	13000	0.0	0.4235
3.8047	13050	-	0.4235
3.8192	13100	-	0.4235
3.8338	13150	-	0.4235
3.8484	13200	-	0.4235
3.8630	13250	-	0.4235
3.8776	13300	-	0.4235
3.8921	13350	-	0.4235
3.9067	13400	-	0.4235
3.9213	13450	-	0.4235
3.9359	13500	0.0	0.4235
3.9504	13550	-	0.4235
3.9650	13600	-	0.4235
3.9796	13650	-	0.4235
3.9942	13700	-	0.4235
4.0	13720	-	0.4235
4.0087	13750	-	0.4235
4.0233	13800	-	0.4235
4.0379	13850	-	0.4235
4.0525	13900	-	0.4235
4.0671	13950	-	0.4235
4.0816	14000	0.0	0.4236

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.1.1
Transformers: 4.44.2
PyTorch: 2.4.1+cu121
Accelerate: 0.34.2
Datasets: 2.14.4
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}