jeevanions's picture
Add new SentenceTransformer model.
b2f3503 verified
metadata
base_model: Snowflake/snowflake-arctic-embed-l
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@3
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@3
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@3
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@10
  - dot_mrr@10
  - dot_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:3430
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      What are some illustrative cases that show the implementation of the AI
      Bill of Rights?
    sentences:
      - >-
        SECTION TITLE

        APPENDIX

        Listening to the American People 

        The White House Office of Science and Technology Policy (OSTP) led a
        yearlong process to seek and distill 

        input from people across the country  from impacted communities to
        industry stakeholders to 

        technology developers to other experts across fields and sectors, as
        well as policymakers across the Federal 

        government  on the issue of algorithmic and data-driven harms and
        potential remedies. Through panel 

        discussions, public listening sessions, private meetings, a formal
        request for information, and input to a 

        publicly accessible and widely-publicized email address, people across
        the United States spoke up about 

        both the promises and potential harms of these technologies, and played
        a central role in shaping the 

        Blueprint for an AI Bill of Rights. 

        Panel Discussions to Inform the Blueprint for An AI Bill of Rights 

        OSTP co-hosted a series of six panel discussions in collaboration with
        the Center for American Progress,
      - >-
        existing human performance considered as a performance baseline for the
        algorithm to meet pre-deployment, 

        and as a lifecycle minimum performance standard. Decision possibilities
        resulting from performance testing 

        should include the possibility of not deploying the system. 

        Risk identification and mitigation. Before deployment, and in a
        proactive and ongoing manner, poten­

        tial risks of the automated system should be identified and mitigated.
        Identified risks should focus on the 

        potential for meaningful impact on people’s rights, opportunities, or
        access and include those to impacted 

        communities that may not be direct users of the automated system, risks
        resulting from purposeful misuse of 

        the system, and other concerns identified via the consultation process.
        Assessment and, where possible, mea­

        surement of the impact of risks should be included and balanced such
        that high impact risks receive attention
      - >-
        confidence that their rights, opportunities, and access as well as their
        expectations about technologies are respected. 

        3

        HOW THESE PRINCIPLES CAN MOVE INTO PRACTICE: 

        This section provides real-life examples of how these guiding principles
        can become reality, through laws, policies, and practices. 

        It describes practical technical and sociotechnical approaches to
        protecting rights, opportunities, and access. 

        The examples provided are not critiques or endorsements, but rather are
        offered as illustrative cases to help 

        provide a concrete vision for actualizing the Blueprint for an AI Bill
        of Rights. Effectively implementing these 

        processes require the cooperation of and collaboration among industry,
        civil society, researchers, policymakers, 

        technologists, and the public. 

        14
  - source_sentence: What are the potential impacts of automated systems on data privacy?
    sentences:
      - >-
        https://arxiv.org/pdf/2305.17493v2 

        Smith, A. et al. (2023) Hallucination or Confabulation? Neuroanatomy as
        metaphor in Large Language 

        Models. PLOS Digital Health. 

        https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000388 

        Soice, E. et al. (2023) Can large language models democratize access to
        dual-use biotechnology? arXiv. 

        https://arxiv.org/abs/2306.03809 

        Solaiman, I. et al. (2023) The Gradient of Generative AI Release:
        Methods and Considerations. arXiv. 

        https://arxiv.org/abs/2302.04844 

        Staab, R. et al. (2023) Beyond Memorization: Violating Privacy via
        Inference With Large Language 

        Models. arXiv. https://arxiv.org/pdf/2310.07298 

        Stanford, S. et al. (2023) Whose Opinions Do Language Models Reflect?
        arXiv. 

        https://arxiv.org/pdf/2303.17548 

        Strubell, E. et al. (2019) Energy and Policy Considerations for Deep
        Learning in NLP. arXiv. 

        https://arxiv.org/pdf/1906.02243 

        The White House (2016) Circular No. A-130, Managing Information as a
        Strategic Resource.
      - >-
        and data that are considered sensitive are understood to change over
        time based on societal norms and context. 

        36
      - |-
        yet foreseeable, uses or impacts of automated systems. You should be 
        protected from inappropriate or irrelevant data use in the design, de­
        velopment, and deployment of automated systems, and from the 
        compounded harm of its reuse. Independent evaluation and report­
        ing that confirms that the system is safe and effective, including re­
        porting of steps taken to mitigate potential harms, should be per­
        formed and the results made public whenever possible. 
        15
  - source_sentence: What is the AI Bill of Rights?
    sentences:
      - |-
        BLUEPRINT FOR AN 
        AI BILL OF 
        RIGHTS 
        MAKING AUTOMATED 
        SYSTEMS WORK FOR 
        THE AMERICAN PEOPLE 
        OCTOBER 2022
      - >-
        APPENDIX

        

        Julia Simon-Mishel, Supervising Attorney, Philadelphia Legal Assistance

        

        Dr. Zachary Mahafza, Research & Data Analyst, Southern Poverty Law
        Center

        

        J. Khadijah Abdurahman, Tech Impact Network Research Fellow, AI Now
        Institute, UCLA C2I1, and

        UWA Law School

        Panelists separately described the increasing scope of technology use in
        providing for social welfare, including 

        in fraud detection, digital ID systems, and other methods focused on
        improving efficiency and reducing cost. 

        However, various panelists individually cautioned that these systems may
        reduce burden for government 

        agencies by increasing the burden and agency of people using and
        interacting with these technologies. 

        Additionally, these systems can produce feedback loops and compounded
        harm, collecting data from 

        communities and using it to reinforce inequality. Various panelists
        suggested that these harms could be 

        mitigated by ensuring community input at the beginning of the design
        process, providing ways to opt out of
      - >-
        safe, secure, and resilient; (e) understandable; (f ) responsible and
        traceable; (g) regularly monitored; (h) transpar-

        ent; and, (i) accountable. The Blueprint for an AI Bill of Rights is
        consistent with the Executive Order. 

        Affected agencies across the federal government have released AI use
        case inventories13 and are implementing 

        plans to bring those AI systems into compliance with the Executive Order
        or retire them. 

        The law and policy landscape for motor vehicles shows that strong safety
        regulations—and 

        measures to address harms when they occur—can enhance innovation in the
        context of com-

        plex technologies. Cars, like automated digital systems, comprise a
        complex collection of components. 

        The National Highway Traffic Safety Administration,14 through its
        rigorous standards and independent 

        evaluation, helps make sure vehicles on our roads are safe without
        limiting manufacturers’ ability to 

        innovate.15 At the same time, rules of the road are implemented locally
        to impose contextually appropriate
  - source_sentence: >-
      What are the best practices for benchmarking AI system security and
      resilience?
    sentences:
      - >-
        NOTICE & 

        EXPLANATION 

        WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS

        The expectations for automated systems are meant to serve as a blueprint
        for the development of additional 

        technical standards and practices that are tailored for particular
        sectors and contexts. 

        An automated system should provide demonstrably clear, timely,
        understandable, and accessible notice of use, and 

        explanations as to how and why a decision was made or an action was
        taken by the system. These expectations are 

        explained below. 

        Provide clear, timely, understandable, and accessible notice of use and
        explanations ­

        Generally accessible plain language documentation. The entity
        responsible for using the automated 

        system should ensure that documentation describing the overall system
        (including any human components) is 

        public and easy to find. The documentation should describe, in plain
        language, how the system works and how
      - >-
        content performance and impact, and work in collaboration with AI
        Actors 

        experienced in user research and experience. 

        Human-AI Configuration 

        MG-4.1-004 Implement active learning techniques to identify instances
        where the model fails 

        or produces unexpected outputs. 

        Confabulation 

        MG-4.1-005 

        Share transparency reports with internal and external stakeholders that
        detail 

        steps taken to update the GAI system to enhance transparency and 

        accountability. 

        Human-AI Configuration; Harmful 

        Bias and Homogenization 

        MG-4.1-006 

        Track dataset modifications for provenance by monitoring data deletions, 

        rectification requests, and other changes that may impact the
        verifiability of 

        content origins. 

        Information Integrity
      - >-
        33 

        MEASURE 2.7: AI system security and resilience  as identified in the MAP
        function  are evaluated and documented. 

        Action ID 

        Suggested Action 

        GAI Risks 

        MS-2.7-001 

        Apply established security measures to: Assess likelihood and magnitude
        of 

        vulnerabilities and threats such as backdoors, compromised dependencies,
        data 

        breaches, eavesdropping, man-in-the-middle attacks, reverse
        engineering, 

        autonomous agents, model theft or exposure of model weights, AI
        inference, 

        bypass, extraction, and other baseline security concerns. 

        Data Privacy; Information Integrity; 

        Information Security; Value Chain 

        and Component Integration 

        MS-2.7-002 

        Benchmark GAI system security and resilience related to content
        provenance 

        against industry standards and best practices. Compare GAI system
        security 

        features and content provenance methods against industry
        state-of-the-art. 

        Information Integrity; Information 

        Security 

        MS-2.7-003 

        Conduct user surveys to gather user satisfaction with the AI-generated
        content
  - source_sentence: >-
      How should risks or trustworthiness characteristics that cannot be
      measured be documented?
    sentences:
      - >-
        MEASURE 1.1: Approaches and metrics for measurement of AI risks
        enumerated during the MAP function are selected for 

        implementation starting with the most significant AI risks. The risks or
        trustworthiness characteristics that will not  or cannot  be 

        measured are properly documented. 

        Action ID 

        Suggested Action 

        GAI Risks 

        MS-1.1-001 Employ methods to trace the origin and modifications of
        digital content. 

        Information Integrity 

        MS-1.1-002 

        Integrate tools designed to analyze content provenance and detect data 

        anomalies, verify the authenticity of digital signatures, and identify
        patterns 

        associated with misinformation or manipulation. 

        Information Integrity 

        MS-1.1-003 

        Disaggregate evaluation metrics by demographic factors to identify any 

        discrepancies in how content provenance mechanisms work across diverse 

        populations. 

        Information Integrity; Harmful 

        Bias and Homogenization 

        MS-1.1-004 Develop a suite of metrics to evaluate structured public
        feedback exercises
      - >-
        AI technology can produce varied outputs in multiple modalities and
        present many classes of user 

        interfaces. This leads to a broader set of AI Actors interacting with
        GAI systems for widely differing 

        applications and contexts of use. These can include data labeling and
        preparation, development of GAI 

        models, content moderation, code generation and review, text generation
        and editing, image and video 

        generation, summarization, search, and chat. These activities can take
        place within organizational 

        settings or in the public domain. 

        Organizations can restrict AI applications that cause harm, exceed
        stated risk tolerances, or that conflict 

        with their tolerances or values. Governance tools and protocols that are
        applied to other types of AI 

        systems can be applied to GAI systems. These plans and actions include: 

         Accessibility and reasonable 

        accommodations 

         AI actor credentials and qualifications  

         Alignment to organizational values 

         Auditing and assessment 

         Change-management controls
      - >-
        existing human performance considered as a performance baseline for the
        algorithm to meet pre-deployment, 

        and as a lifecycle minimum performance standard. Decision possibilities
        resulting from performance testing 

        should include the possibility of not deploying the system. 

        Risk identification and mitigation. Before deployment, and in a
        proactive and ongoing manner, poten­

        tial risks of the automated system should be identified and mitigated.
        Identified risks should focus on the 

        potential for meaningful impact on people’s rights, opportunities, or
        access and include those to impacted 

        communities that may not be direct users of the automated system, risks
        resulting from purposeful misuse of 

        the system, and other concerns identified via the consultation process.
        Assessment and, where possible, mea­

        surement of the impact of risks should be included and balanced such
        that high impact risks receive attention
model-index:
  - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.2807017543859649
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4649122807017544
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5350877192982456
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.7192982456140351
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.2807017543859649
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.15497076023391812
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.10701754385964912
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0719298245614035
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.2807017543859649
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4649122807017544
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5350877192982456
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.7192982456140351
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.4797086283187805
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.40644667223614606
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.423567506926962
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.2807017543859649
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.4649122807017544
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.5350877192982456
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.7192982456140351
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.2807017543859649
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.15497076023391812
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.10701754385964912
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.0719298245614035
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.2807017543859649
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.4649122807017544
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.5350877192982456
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.7192982456140351
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.4797086283187805
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.40644667223614606
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.423567506926962
            name: Dot Map@100

SentenceTransformer based on Snowflake/snowflake-arctic-embed-l

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-l
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("jeevanions/finetuned_arctic-embedd-l")
# Run inference
sentences = [
    'How should risks or trustworthiness characteristics that cannot be measured be documented?',
    'MEASURE 1.1: Approaches and metrics for measurement of AI risks enumerated during the MAP function are selected for \nimplementation starting with the most significant AI risks. The risks or trustworthiness characteristics that will not – or cannot – be \nmeasured are properly documented. \nAction ID \nSuggested Action \nGAI Risks \nMS-1.1-001 Employ methods to trace the origin and modifications of digital content. \nInformation Integrity \nMS-1.1-002 \nIntegrate tools designed to analyze content provenance and detect data \nanomalies, verify the authenticity of digital signatures, and identify patterns \nassociated with misinformation or manipulation. \nInformation Integrity \nMS-1.1-003 \nDisaggregate evaluation metrics by demographic factors to identify any \ndiscrepancies in how content provenance mechanisms work across diverse \npopulations. \nInformation Integrity; Harmful \nBias and Homogenization \nMS-1.1-004 Develop a suite of metrics to evaluate structured public feedback exercises',
    'existing human performance considered as a performance baseline for the algorithm to meet pre-deployment, \nand as a lifecycle minimum performance standard. Decision possibilities resulting from performance testing \nshould include the possibility of not deploying the system. \nRisk identification and mitigation. Before deployment, and in a proactive and ongoing manner, poten\xad\ntial risks of the automated system should be identified and mitigated. Identified risks should focus on the \npotential for meaningful impact on people’s rights, opportunities, or access and include those to impacted \ncommunities that may not be direct users of the automated system, risks resulting from purposeful misuse of \nthe system, and other concerns identified via the consultation process. Assessment and, where possible, mea\xad\nsurement of the impact of risks should be included and balanced such that high impact risks receive attention',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.2807
cosine_accuracy@3 0.4649
cosine_accuracy@5 0.5351
cosine_accuracy@10 0.7193
cosine_precision@1 0.2807
cosine_precision@3 0.155
cosine_precision@5 0.107
cosine_precision@10 0.0719
cosine_recall@1 0.2807
cosine_recall@3 0.4649
cosine_recall@5 0.5351
cosine_recall@10 0.7193
cosine_ndcg@10 0.4797
cosine_mrr@10 0.4064
cosine_map@100 0.4236
dot_accuracy@1 0.2807
dot_accuracy@3 0.4649
dot_accuracy@5 0.5351
dot_accuracy@10 0.7193
dot_precision@1 0.2807
dot_precision@3 0.155
dot_precision@5 0.107
dot_precision@10 0.0719
dot_recall@1 0.2807
dot_recall@3 0.4649
dot_recall@5 0.5351
dot_recall@10 0.7193
dot_ndcg@10 0.4797
dot_mrr@10 0.4064
dot_map@100 0.4236

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,430 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 8 tokens
    • mean: 17.71 tokens
    • max: 36 tokens
    • min: 7 tokens
    • mean: 172.72 tokens
    • max: 356 tokens
  • Samples:
    sentence_0 sentence_1
    What are the key steps to obtain input from stakeholder communities to identify unacceptable use in AI systems? 15
    GV-1.3-004 Obtain input from stakeholder communities to identify unacceptable use, in
    accordance with activities in the AI RMF Map function.
    CBRN Information or Capabilities;
    Obscene, Degrading, and/or
    Abusive Content; Harmful Bias
    and Homogenization; Dangerous,
    Violent, or Hateful Content
    GV-1.3-005
    Maintain an updated hierarchy of identified and expected GAI risks connected to
    contexts of GAI model advancement and use, potentially including specialized risk
    levels for GAI systems that address issues such as model collapse and algorithmic
    monoculture.
    Harmful Bias and Homogenization
    GV-1.3-006
    Reevaluate organizational risk tolerances to account for unacceptable negative risk
    (such as where significant negative impacts are imminent, severe harms are
    actually occurring, or large-scale risks could occur); and broad GAI negative risks,
    including: Immature safety or risk cultures related to AI and GAI design,
    development and deployment, public information integrity risks, including impacts
    How can organizations maintain an updated hierarchy of identified and expected GAI risks? 15
    GV-1.3-004 Obtain input from stakeholder communities to identify unacceptable use, in
    accordance with activities in the AI RMF Map function.
    CBRN Information or Capabilities;
    Obscene, Degrading, and/or
    Abusive Content; Harmful Bias
    and Homogenization; Dangerous,
    Violent, or Hateful Content
    GV-1.3-005
    Maintain an updated hierarchy of identified and expected GAI risks connected to
    contexts of GAI model advancement and use, potentially including specialized risk
    levels for GAI systems that address issues such as model collapse and algorithmic
    monoculture.
    Harmful Bias and Homogenization
    GV-1.3-006
    Reevaluate organizational risk tolerances to account for unacceptable negative risk
    (such as where significant negative impacts are imminent, severe harms are
    actually occurring, or large-scale risks could occur); and broad GAI negative risks,
    including: Immature safety or risk cultures related to AI and GAI design,
    development and deployment, public information integrity risks, including impacts
    What are some examples of unacceptable uses of AI as identified by stakeholder communities? 15
    GV-1.3-004 Obtain input from stakeholder communities to identify unacceptable use, in
    accordance with activities in the AI RMF Map function.
    CBRN Information or Capabilities;
    Obscene, Degrading, and/or
    Abusive Content; Harmful Bias
    and Homogenization; Dangerous,
    Violent, or Hateful Content
    GV-1.3-005
    Maintain an updated hierarchy of identified and expected GAI risks connected to
    contexts of GAI model advancement and use, potentially including specialized risk
    levels for GAI systems that address issues such as model collapse and algorithmic
    monoculture.
    Harmful Bias and Homogenization
    GV-1.3-006
    Reevaluate organizational risk tolerances to account for unacceptable negative risk
    (such as where significant negative impacts are imminent, severe harms are
    actually occurring, or large-scale risks could occur); and broad GAI negative risks,
    including: Immature safety or risk cultures related to AI and GAI design,
    development and deployment, public information integrity risks, including impacts
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 1
  • per_device_eval_batch_size: 1
  • num_train_epochs: 5
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 1
  • per_device_eval_batch_size: 1
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss cosine_map@100
0.0146 50 - 0.4134
0.0292 100 - 0.4134
0.0437 150 - 0.4134
0.0583 200 - 0.4134
0.0729 250 - 0.4134
0.0875 300 - 0.4134
0.1020 350 - 0.4134
0.1166 400 - 0.4134
0.1312 450 - 0.4134
0.1458 500 0.0 0.4134
0.1603 550 - 0.4134
0.1749 600 - 0.4134
0.1895 650 - 0.4134
0.2041 700 - 0.4134
0.2187 750 - 0.4134
0.2332 800 - 0.4134
0.2478 850 - 0.4134
0.2624 900 - 0.4134
0.2770 950 - 0.4134
0.2915 1000 0.0 0.4134
0.3061 1050 - 0.4134
0.3207 1100 - 0.4134
0.3353 1150 - 0.4134
0.3499 1200 - 0.4134
0.3644 1250 - 0.4134
0.3790 1300 - 0.4134
0.3936 1350 - 0.4134
0.4082 1400 - 0.4134
0.4227 1450 - 0.4134
0.4373 1500 0.0 0.4134
0.4519 1550 - 0.4134
0.4665 1600 - 0.4134
0.4810 1650 - 0.4134
0.4956 1700 - 0.4134
0.5102 1750 - 0.4134
0.5248 1800 - 0.4134
0.5394 1850 - 0.4134
0.5539 1900 - 0.4134
0.5685 1950 - 0.4134
0.5831 2000 0.0 0.4135
0.5977 2050 - 0.4135
0.6122 2100 - 0.4135
0.6268 2150 - 0.4135
0.6414 2200 - 0.4135
0.6560 2250 - 0.4135
0.6706 2300 - 0.4135
0.6851 2350 - 0.4135
0.6997 2400 - 0.4135
0.7143 2450 - 0.4134
0.7289 2500 0.0 0.4134
0.7434 2550 - 0.4134
0.7580 2600 - 0.4134
0.7726 2650 - 0.4134
0.7872 2700 - 0.4134
0.8017 2750 - 0.4134
0.8163 2800 - 0.4134
0.8309 2850 - 0.4135
0.8455 2900 - 0.4135
0.8601 2950 - 0.4135
0.8746 3000 0.0 0.4135
0.8892 3050 - 0.4135
0.9038 3100 - 0.4135
0.9184 3150 - 0.4135
0.9329 3200 - 0.4135
0.9475 3250 - 0.4135
0.9621 3300 - 0.4135
0.9767 3350 - 0.4135
0.9913 3400 - 0.4135
1.0 3430 - 0.4135
1.0058 3450 - 0.4135
1.0204 3500 0.0 0.4135
1.0350 3550 - 0.4135
1.0496 3600 - 0.4135
1.0641 3650 - 0.4135
1.0787 3700 - 0.4135
1.0933 3750 - 0.4135
1.1079 3800 - 0.4135
1.1224 3850 - 0.4135
1.1370 3900 - 0.4179
1.1516 3950 - 0.4179
1.1662 4000 0.0 0.4179
1.1808 4050 - 0.4179
1.1953 4100 - 0.4179
1.2099 4150 - 0.4179
1.2245 4200 - 0.4179
1.2391 4250 - 0.4179
1.2536 4300 - 0.4179
1.2682 4350 - 0.4179
1.2828 4400 - 0.4179
1.2974 4450 - 0.4179
1.3120 4500 0.0 0.4179
1.3265 4550 - 0.4179
1.3411 4600 - 0.4179
1.3557 4650 - 0.4179
1.3703 4700 - 0.4179
1.3848 4750 - 0.4179
1.3994 4800 - 0.4179
1.4140 4850 - 0.4179
1.4286 4900 - 0.4179
1.4431 4950 - 0.4179
1.4577 5000 0.0 0.4179
1.4723 5050 - 0.4179
1.4869 5100 - 0.4179
1.5015 5150 - 0.4179
1.5160 5200 - 0.4179
1.5306 5250 - 0.4179
1.5452 5300 - 0.4179
1.5598 5350 - 0.4179
1.5743 5400 - 0.4179
1.5889 5450 - 0.4179
1.6035 5500 0.0 0.4179
1.6181 5550 - 0.4179
1.6327 5600 - 0.4179
1.6472 5650 - 0.4179
1.6618 5700 - 0.4179
1.6764 5750 - 0.4179
1.6910 5800 - 0.4179
1.7055 5850 - 0.4179
1.7201 5900 - 0.4179
1.7347 5950 - 0.4179
1.7493 6000 0.0 0.4179
1.7638 6050 - 0.4179
1.7784 6100 - 0.4179
1.7930 6150 - 0.4179
1.8076 6200 - 0.4179
1.8222 6250 - 0.4179
1.8367 6300 - 0.4179
1.8513 6350 - 0.4179
1.8659 6400 - 0.4179
1.8805 6450 - 0.4179
1.8950 6500 0.0 0.4179
1.9096 6550 - 0.4179
1.9242 6600 - 0.4179
1.9388 6650 - 0.4179
1.9534 6700 - 0.4179
1.9679 6750 - 0.4179
1.9825 6800 - 0.4179
1.9971 6850 - 0.4179
2.0 6860 - 0.4179
2.0117 6900 - 0.4179
2.0262 6950 - 0.4179
2.0408 7000 0.0 0.4179
2.0554 7050 - 0.4179
2.0700 7100 - 0.4179
2.0845 7150 - 0.4179
2.0991 7200 - 0.4179
2.1137 7250 - 0.4179
2.1283 7300 - 0.4179
2.1429 7350 - 0.4179
2.1574 7400 - 0.4179
2.1720 7450 - 0.4179
2.1866 7500 0.0 0.4179
2.2012 7550 - 0.4179
2.2157 7600 - 0.4179
2.2303 7650 - 0.4179
2.2449 7700 - 0.4179
2.2595 7750 - 0.4179
2.2741 7800 - 0.4179
2.2886 7850 - 0.4179
2.3032 7900 - 0.4179
2.3178 7950 - 0.4179
2.3324 8000 0.0 0.4179
2.3469 8050 - 0.4179
2.3615 8100 - 0.4179
2.3761 8150 - 0.4179
2.3907 8200 - 0.4179
2.4052 8250 - 0.4179
2.4198 8300 - 0.4179
2.4344 8350 - 0.4179
2.4490 8400 - 0.4179
2.4636 8450 - 0.4179
2.4781 8500 0.0 0.4179
2.4927 8550 - 0.4179
2.5073 8600 - 0.4179
2.5219 8650 - 0.4179
2.5364 8700 - 0.4179
2.5510 8750 - 0.4179
2.5656 8800 - 0.4179
2.5802 8850 - 0.4179
2.5948 8900 - 0.4179
2.6093 8950 - 0.4179
2.6239 9000 0.0 0.4179
2.6385 9050 - 0.4179
2.6531 9100 - 0.4179
2.6676 9150 - 0.4179
2.6822 9200 - 0.4179
2.6968 9250 - 0.4223
2.7114 9300 - 0.4223
2.7259 9350 - 0.4223
2.7405 9400 - 0.4223
2.7551 9450 - 0.4223
2.7697 9500 0.0 0.4223
2.7843 9550 - 0.4223
2.7988 9600 - 0.4223
2.8134 9650 - 0.4223
2.8280 9700 - 0.4223
2.8426 9750 - 0.4223
2.8571 9800 - 0.4223
2.8717 9850 - 0.4223
2.8863 9900 - 0.4223
2.9009 9950 - 0.4223
2.9155 10000 0.0 0.4223
2.9300 10050 - 0.4223
2.9446 10100 - 0.4223
2.9592 10150 - 0.4223
2.9738 10200 - 0.4223
2.9883 10250 - 0.4223
3.0 10290 - 0.4223
3.0029 10300 - 0.4223
3.0175 10350 - 0.4223
3.0321 10400 - 0.4223
3.0466 10450 - 0.4223
3.0612 10500 0.0 0.4223
3.0758 10550 - 0.4223
3.0904 10600 - 0.4223
3.1050 10650 - 0.4223
3.1195 10700 - 0.4223
3.1341 10750 - 0.4223
3.1487 10800 - 0.4223
3.1633 10850 - 0.4223
3.1778 10900 - 0.4223
3.1924 10950 - 0.4223
3.2070 11000 0.0 0.4223
3.2216 11050 - 0.4223
3.2362 11100 - 0.4223
3.2507 11150 - 0.4223
3.2653 11200 - 0.4223
3.2799 11250 - 0.4223
3.2945 11300 - 0.4223
3.3090 11350 - 0.4223
3.3236 11400 - 0.4223
3.3382 11450 - 0.4223
3.3528 11500 0.0 0.4223
3.3673 11550 - 0.4223
3.3819 11600 - 0.4223
3.3965 11650 - 0.4223
3.4111 11700 - 0.4223
3.4257 11750 - 0.4223
3.4402 11800 - 0.4223
3.4548 11850 - 0.4223
3.4694 11900 - 0.4223
3.4840 11950 - 0.4223
3.4985 12000 0.0 0.4223
3.5131 12050 - 0.4223
3.5277 12100 - 0.4223
3.5423 12150 - 0.4223
3.5569 12200 - 0.4223
3.5714 12250 - 0.4223
3.5860 12300 - 0.4223
3.6006 12350 - 0.4223
3.6152 12400 - 0.4223
3.6297 12450 - 0.4223
3.6443 12500 0.0 0.4223
3.6589 12550 - 0.4223
3.6735 12600 - 0.4223
3.6880 12650 - 0.4223
3.7026 12700 - 0.4223
3.7172 12750 - 0.4223
3.7318 12800 - 0.4223
3.7464 12850 - 0.4223
3.7609 12900 - 0.4223
3.7755 12950 - 0.4223
3.7901 13000 0.0 0.4223
3.8047 13050 - 0.4223
3.8192 13100 - 0.4226
3.8338 13150 - 0.4226
3.8484 13200 - 0.4226
3.8630 13250 - 0.4226
3.8776 13300 - 0.4226
3.8921 13350 - 0.4226
3.9067 13400 - 0.4226
3.9213 13450 - 0.4226
3.9359 13500 0.0 0.4226
3.9504 13550 - 0.4226
3.9650 13600 - 0.4226
3.9796 13650 - 0.4226
3.9942 13700 - 0.4226
4.0 13720 - 0.4226
4.0087 13750 - 0.4226
4.0233 13800 - 0.4226
4.0379 13850 - 0.4226
4.0525 13900 - 0.4226
4.0671 13950 - 0.4226
4.0816 14000 0.0 0.4226
4.0962 14050 - 0.4226
4.1108 14100 - 0.4226
4.1254 14150 - 0.4226
4.1399 14200 - 0.4226
4.1545 14250 - 0.4226
4.1691 14300 - 0.4226
4.1837 14350 - 0.4226
4.1983 14400 - 0.4226
4.2128 14450 - 0.4226
4.2274 14500 0.0 0.4226
4.2420 14550 - 0.4226
4.2566 14600 - 0.4226
4.2711 14650 - 0.4226
4.2857 14700 - 0.4226
4.3003 14750 - 0.4226
4.3149 14800 - 0.4226
4.3294 14850 - 0.4226
4.3440 14900 - 0.4226
4.3586 14950 - 0.4226
4.3732 15000 0.0 0.4226
4.3878 15050 - 0.4226
4.4023 15100 - 0.4226
4.4169 15150 - 0.4226
4.4315 15200 - 0.4226
4.4461 15250 - 0.4226
4.4606 15300 - 0.4226
4.4752 15350 - 0.4226
4.4898 15400 - 0.4226
4.5044 15450 - 0.4226
4.5190 15500 0.0 0.4226
4.5335 15550 - 0.4226
4.5481 15600 - 0.4226
4.5627 15650 - 0.4226
4.5773 15700 - 0.4226
4.5918 15750 - 0.4226
4.6064 15800 - 0.4226
4.6210 15850 - 0.4226
4.6356 15900 - 0.4226
4.6501 15950 - 0.4226
4.6647 16000 0.0 0.4226
4.6793 16050 - 0.4226
4.6939 16100 - 0.4226
4.7085 16150 - 0.4226
4.7230 16200 - 0.4226
4.7376 16250 - 0.4226
4.7522 16300 - 0.4226
4.7668 16350 - 0.4226
4.7813 16400 - 0.4226
4.7959 16450 - 0.4226
4.8105 16500 0.0 0.4226
4.8251 16550 - 0.4226
4.8397 16600 - 0.4226
4.8542 16650 - 0.4226
4.8688 16700 - 0.4226
4.8834 16750 - 0.4226
4.8980 16800 - 0.4226
4.9125 16850 - 0.4226
4.9271 16900 - 0.4226
4.9417 16950 - 0.4226
4.9563 17000 0.0 0.4226
4.9708 17050 - 0.4226
4.9854 17100 - 0.4226
5.0 17150 - 0.4226
0.0146 50 - 0.4226
0.0292 100 - 0.4226
0.0437 150 - 0.4226
0.0583 200 - 0.4226
0.0729 250 - 0.4226
0.0875 300 - 0.4226
0.1020 350 - 0.4226
0.1166 400 - 0.4226
0.1312 450 - 0.4226
0.1458 500 0.0 0.4226
0.1603 550 - 0.4226
0.1749 600 - 0.4226
0.1895 650 - 0.4226
0.2041 700 - 0.4226
0.2187 750 - 0.4226
0.2332 800 - 0.4226
0.2478 850 - 0.4226
0.2624 900 - 0.4226
0.2770 950 - 0.4226
0.2915 1000 0.0 0.4227
0.3061 1050 - 0.4227
0.3207 1100 - 0.4227
0.3353 1150 - 0.4227
0.3499 1200 - 0.4227
0.3644 1250 - 0.4227
0.3790 1300 - 0.4227
0.3936 1350 - 0.4227
0.4082 1400 - 0.4227
0.4227 1450 - 0.4227
0.4373 1500 0.0 0.4227
0.4519 1550 - 0.4227
0.4665 1600 - 0.4227
0.4810 1650 - 0.4227
0.4956 1700 - 0.4227
0.5102 1750 - 0.4227
0.5248 1800 - 0.4227
0.5394 1850 - 0.4227
0.5539 1900 - 0.4227
0.5685 1950 - 0.4227
0.5831 2000 0.0 0.4227
0.5977 2050 - 0.4227
0.6122 2100 - 0.4227
0.6268 2150 - 0.4227
0.6414 2200 - 0.4227
0.6560 2250 - 0.4227
0.6706 2300 - 0.4227
0.6851 2350 - 0.4227
0.6997 2400 - 0.4227
0.7143 2450 - 0.4227
0.7289 2500 0.0 0.4227
0.7434 2550 - 0.4227
0.7580 2600 - 0.4227
0.7726 2650 - 0.4227
0.7872 2700 - 0.4227
0.8017 2750 - 0.4227
0.8163 2800 - 0.4227
0.8309 2850 - 0.4227
0.8455 2900 - 0.4227
0.8601 2950 - 0.4227
0.8746 3000 0.0 0.4227
0.8892 3050 - 0.4227
0.9038 3100 - 0.4227
0.9184 3150 - 0.4227
0.9329 3200 - 0.4227
0.9475 3250 - 0.4227
0.9621 3300 - 0.4227
0.9767 3350 - 0.4227
0.9913 3400 - 0.4227
1.0 3430 - 0.4227
1.0058 3450 - 0.4227
1.0204 3500 0.0 0.4227
1.0350 3550 - 0.4227
1.0496 3600 - 0.4227
1.0641 3650 - 0.4227
1.0787 3700 - 0.4227
1.0933 3750 - 0.4227
1.1079 3800 - 0.4227
1.1224 3850 - 0.4227
1.1370 3900 - 0.4227
1.1516 3950 - 0.4227
1.1662 4000 0.0 0.4227
1.1808 4050 - 0.4227
1.1953 4100 - 0.4227
1.2099 4150 - 0.4231
1.2245 4200 - 0.4231
1.2391 4250 - 0.4231
1.2536 4300 - 0.4231
1.2682 4350 - 0.4231
1.2828 4400 - 0.4231
1.2974 4450 - 0.4231
1.3120 4500 0.0 0.4231
1.3265 4550 - 0.4231
1.3411 4600 - 0.4231
1.3557 4650 - 0.4232
1.3703 4700 - 0.4232
1.3848 4750 - 0.4232
1.3994 4800 - 0.4232
1.4140 4850 - 0.4232
1.4286 4900 - 0.4232
1.4431 4950 - 0.4232
1.4577 5000 0.0 0.4232
1.4723 5050 - 0.4232
1.4869 5100 - 0.4232
1.5015 5150 - 0.4232
1.5160 5200 - 0.4232
1.5306 5250 - 0.4232
1.5452 5300 - 0.4233
1.5598 5350 - 0.4233
1.5743 5400 - 0.4233
1.5889 5450 - 0.4233
1.6035 5500 0.0 0.4233
1.6181 5550 - 0.4233
1.6327 5600 - 0.4233
1.6472 5650 - 0.4233
1.6618 5700 - 0.4233
1.6764 5750 - 0.4233
1.6910 5800 - 0.4233
1.7055 5850 - 0.4233
1.7201 5900 - 0.4233
1.7347 5950 - 0.4233
1.7493 6000 0.0 0.4233
1.7638 6050 - 0.4234
1.7784 6100 - 0.4234
1.7930 6150 - 0.4234
1.8076 6200 - 0.4234
1.8222 6250 - 0.4234
1.8367 6300 - 0.4234
1.8513 6350 - 0.4234
1.8659 6400 - 0.4234
1.8805 6450 - 0.4234
1.8950 6500 0.0 0.4234
1.9096 6550 - 0.4234
1.9242 6600 - 0.4234
1.9388 6650 - 0.4234
1.9534 6700 - 0.4234
1.9679 6750 - 0.4234
1.9825 6800 - 0.4234
1.9971 6850 - 0.4234
2.0 6860 - 0.4234
2.0117 6900 - 0.4234
2.0262 6950 - 0.4234
2.0408 7000 0.0 0.4234
2.0554 7050 - 0.4234
2.0700 7100 - 0.4234
2.0845 7150 - 0.4234
2.0991 7200 - 0.4234
2.1137 7250 - 0.4234
2.1283 7300 - 0.4234
2.1429 7350 - 0.4234
2.1574 7400 - 0.4234
2.1720 7450 - 0.4234
2.1866 7500 0.0 0.4234
2.2012 7550 - 0.4234
2.2157 7600 - 0.4234
2.2303 7650 - 0.4234
2.2449 7700 - 0.4234
2.2595 7750 - 0.4234
2.2741 7800 - 0.4234
2.2886 7850 - 0.4234
2.3032 7900 - 0.4234
2.3178 7950 - 0.4234
2.3324 8000 0.0 0.4234
2.3469 8050 - 0.4234
2.3615 8100 - 0.4234
2.3761 8150 - 0.4234
2.3907 8200 - 0.4234
2.4052 8250 - 0.4234
2.4198 8300 - 0.4234
2.4344 8350 - 0.4234
2.4490 8400 - 0.4234
2.4636 8450 - 0.4234
2.4781 8500 0.0 0.4234
2.4927 8550 - 0.4234
2.5073 8600 - 0.4234
2.5219 8650 - 0.4234
2.5364 8700 - 0.4234
2.5510 8750 - 0.4234
2.5656 8800 - 0.4234
2.5802 8850 - 0.4234
2.5948 8900 - 0.4234
2.6093 8950 - 0.4234
2.6239 9000 0.0 0.4234
2.6385 9050 - 0.4234
2.6531 9100 - 0.4234
2.6676 9150 - 0.4234
2.6822 9200 - 0.4234
2.6968 9250 - 0.4234
2.7114 9300 - 0.4234
2.7259 9350 - 0.4234
2.7405 9400 - 0.4234
2.7551 9450 - 0.4234
2.7697 9500 0.0 0.4234
2.7843 9550 - 0.4234
2.7988 9600 - 0.4234
2.8134 9650 - 0.4234
2.8280 9700 - 0.4234
2.8426 9750 - 0.4234
2.8571 9800 - 0.4234
2.8717 9850 - 0.4234
2.8863 9900 - 0.4234
2.9009 9950 - 0.4234
2.9155 10000 0.0 0.4234
2.9300 10050 - 0.4234
2.9446 10100 - 0.4234
2.9592 10150 - 0.4234
2.9738 10200 - 0.4234
2.9883 10250 - 0.4234
3.0 10290 - 0.4234
3.0029 10300 - 0.4234
3.0175 10350 - 0.4234
3.0321 10400 - 0.4234
3.0466 10450 - 0.4234
3.0612 10500 0.0 0.4234
3.0758 10550 - 0.4234
3.0904 10600 - 0.4234
3.1050 10650 - 0.4234
3.1195 10700 - 0.4234
3.1341 10750 - 0.4234
3.1487 10800 - 0.4234
3.1633 10850 - 0.4234
3.1778 10900 - 0.4234
3.1924 10950 - 0.4234
3.2070 11000 0.0 0.4234
3.2216 11050 - 0.4234
3.2362 11100 - 0.4234
3.2507 11150 - 0.4234
3.2653 11200 - 0.4234
3.2799 11250 - 0.4234
3.2945 11300 - 0.4234
3.3090 11350 - 0.4234
3.3236 11400 - 0.4234
3.3382 11450 - 0.4234
3.3528 11500 0.0 0.4234
3.3673 11550 - 0.4234
3.3819 11600 - 0.4234
3.3965 11650 - 0.4234
3.4111 11700 - 0.4234
3.4257 11750 - 0.4234
3.4402 11800 - 0.4234
3.4548 11850 - 0.4235
3.4694 11900 - 0.4235
3.4840 11950 - 0.4235
3.4985 12000 0.0 0.4235
3.5131 12050 - 0.4235
3.5277 12100 - 0.4235
3.5423 12150 - 0.4235
3.5569 12200 - 0.4235
3.5714 12250 - 0.4235
3.5860 12300 - 0.4235
3.6006 12350 - 0.4235
3.6152 12400 - 0.4235
3.6297 12450 - 0.4235
3.6443 12500 0.0 0.4235
3.6589 12550 - 0.4235
3.6735 12600 - 0.4235
3.6880 12650 - 0.4235
3.7026 12700 - 0.4235
3.7172 12750 - 0.4235
3.7318 12800 - 0.4235
3.7464 12850 - 0.4235
3.7609 12900 - 0.4235
3.7755 12950 - 0.4235
3.7901 13000 0.0 0.4235
3.8047 13050 - 0.4235
3.8192 13100 - 0.4235
3.8338 13150 - 0.4235
3.8484 13200 - 0.4235
3.8630 13250 - 0.4235
3.8776 13300 - 0.4235
3.8921 13350 - 0.4235
3.9067 13400 - 0.4235
3.9213 13450 - 0.4235
3.9359 13500 0.0 0.4235
3.9504 13550 - 0.4235
3.9650 13600 - 0.4235
3.9796 13650 - 0.4235
3.9942 13700 - 0.4235
4.0 13720 - 0.4235
4.0087 13750 - 0.4235
4.0233 13800 - 0.4235
4.0379 13850 - 0.4235
4.0525 13900 - 0.4235
4.0671 13950 - 0.4235
4.0816 14000 0.0 0.4236

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.34.2
  • Datasets: 2.14.4
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}