metadata

license: apache-2.0
tags:
  - finetune
  - fine tune
  - dpo
  - sft
  - yi
model-index:
  - name: Yi-34B-200K-AEZAKMI-RAW-1701
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 66.81
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-RAW-1701
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 85.79
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-RAW-1701
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 75.44
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-RAW-1701
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 57.91
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-RAW-1701
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 80.35
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-RAW-1701
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 59.97
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-RAW-1701
          name: Open LLM Leaderboard

THIS MODEL IS EXPERIMENTAL AND MIGHT BE BUGGY, I DIDN'T PERFECT THE STRENGTH OF DPO AND SFT YET.

Yi-34B-200K trained via DPO on RAWrr_v1 at ctx 200 (lora_r 4, lora_alpha 8) and then via SFT at ctx 1400 (lora_r 16, lora_alpha 32) on AEZAKMI_v2. It's less prone to refusals than Yi-34B-200K-AEZAKMI-v2 but that's work in progress still - I want to do DPO with higher lora rank and ctx and then repeat SFT training. I haven't tested it too much, but on what I've seen, it's a good model.

If you want to re-produce this model by merging loras, start by downloading Yi-34B-200K-Llamafied.
Then merge it with https://huggingface.co/adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r2
Then merge the resulting model with https://huggingface.co/adamo1139/yi-34b-200k-aezakmi-v2-rawrr-v1-run1-experimental-LoRA

License: apache-2.0

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	71.04
AI2 Reasoning Challenge (25-Shot)	66.81
HellaSwag (10-Shot)	85.79
MMLU (5-Shot)	75.44
TruthfulQA (0-shot)	57.91
Winogrande (5-shot)	80.35
GSM8k (5-shot)	59.97