Azure_Dusk-v0.2 / README.md
Epiculous's picture
Adding Evaluation Results (#1)
1031135 verified
metadata
language:
  - en
  - fr
  - de
  - es
  - it
  - pt
  - ru
  - zh
  - ja
license: apache-2.0
datasets:
  - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
  - anthracite-org/stheno-filtered-v1.1
  - PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT
  - Gryphe/Sonnet3.5-Charcard-Roleplay
  - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
  - anthracite-org/kalo-opus-instruct-22k-no-refusal
  - anthracite-org/nopm_claude_writing_fixed
  - anthracite-org/kalo_opus_misc_240827
pipeline_tag: text-generation
model-index:
  - name: Azure_Dusk-v0.2
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 34.67
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Azure_Dusk-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 17.4
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Azure_Dusk-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 1.66
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Azure_Dusk-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 1.45
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Azure_Dusk-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 6.37
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Azure_Dusk-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 22.6
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Epiculous/Azure_Dusk-v0.2
          name: Open LLM Leaderboard

image/png

Following up on Crimson_Dawn-v0.2 we have Azure_Dusk-v0.2! Training on Mistral-Nemo-Base-2407 this time I've added significantly more data, as well as trained using RSLoRA as opposed to regular LoRA. Another key change is training on ChatML as opposed to Mistral Formatting.

Quants!

full / exl2 / gguf

Prompting

The v0.2 models are trained on ChatML, the prompting structure goes a little something like this:

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant

Context and Instruct

The v0.2 models are trained on ChatML, please use that Context and Instruct template.

Current Top Sampler Settings

Spicy_Temp
Violet_Twilight-Nitral-Special

Training

Training was done twice over 2 epochs each on two 2x NVIDIA A6000 GPUs using LoRA. A two-phased approach was used in which the base model was trained 2 epochs on RP data, the LoRA was then applied to base. Finally, the new modified base was trained 2 epochs on instruct, and the new instruct LoRA was applied to the modified base, resulting in what you see here.

Built with Axolotl

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 14.03
IFEval (0-Shot) 34.67
BBH (3-Shot) 17.40
MATH Lvl 5 (4-Shot) 1.66
GPQA (0-shot) 1.45
MuSR (0-shot) 6.37
MMLU-PRO (5-shot) 22.60