crestf411/L3.1-8B-sunfall-v0.6.1-dpo

Sunfall (2024-07-31) v0.6.1 on top of a make-it-funner-DPO merge on Meta's Llama-3 8B Instruct.

Experimental. Please give feedback. Begone if you demand perfection.

This experiment is showing potential but it is still early to tell if it is a viable option. Invest your time accordingly.

New since v0.5:

Major expansion of dataset (more than double the size), in particular including a lot of unslopped SFW content to make the model overall more intelligent.
Includes randomized subset of AI-MO/NuminaMath-CoT
Diamond law training bits were expanded. The models didn't seem to remember the rules, so they were rephrased in various ways.

Note: the v0.6 release had some issues with the dataset, in particular the NuminaMath-CoT components were formatted incorrectly.

Mergers/fine-tuners: there is a LoRA of this model. Consider merging that instead of merging this model.

To use lore book tags (example), make sure you use Status: Blue (constant) and write e.g.

Follow the Diamond Law at all costs.

Tags: humor, dark, complex storytelling, intricate characters, immersive.

This model has been trained on context that mimics that of Silly Tavern's Llama3-instruct preset, with the following settings:

System Prompt:

You are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason. Currently your role is {{char}}, which is described in detail below. As {{char}}, continue the exchange with {{user}}.

The card has also been trained on content which includes a narrator card, which was used when the content did not mainly revolve around two characters. Future versions will expand on this idea, so forgive the vagueness at this time.

(The Diamond Law is this, although new rules were added: https://files.catbox.moe/d15m3g.txt -- So far results are unclear, but the training was done with this phrase included, and the training data adheres to the law.)

The model has also been trained to do storywriting. The system message ends up looking something like this:

You are an expert storyteller, who can roleplay or write compelling stories. Follow the Diamond Law at all costs. Below is a scenario with character descriptions and content tags. Write a story based on this scenario.

Scenario: The story is about James, blabla.

James is an overweight 63 year old blabla.

Lucy: James's 62 year old wife.

Tags: tag1, tag2, tag3, ...

MMLU-Pro Benchmark: model overall is higher than the instruct base, but it loses in specific categories.

Llama3.1 8B Instruct base:

| overall | biology | business | chemistry | computer science | economics | engineering | health | history |  law  | math  | philosophy | physics | psychology | other |
| ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | ----- | ----- | ---------- | ------- | ---------- | ----- |
|   47.38 |   56.52 |    56.00 |     38.89 |            53.85 |     44.44 |       38.71 |  61.54 |   66.67 | 45.71 | 41.86 |      43.75 |   39.02 |      44.00 | 58.62 |
|     181 |      13 |       14 |        14 |                7 |        12 |          12 |     16 |       8 |    16 |    18 |          7 |      16 |         11 |    17 |
|     382 |      23 |       25 |        36 |               13 |        27 |          31 |     26 |      12 |    35 |    43 |         16 |      41 |         25 |    29 |

Llama3.1 8B Instruct base + DPO (unreleased):

| overall | biology | business | chemistry | computer science | economics | engineering | health | history |  law  | math  | philosophy | physics | psychology | other |
| ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | ----- | ----- | ---------- | ------- | ---------- | ----- |
|   46.07 |   56.52 |    60.00 |     36.11 |            53.85 |     44.44 |       38.71 |  61.54 |   58.33 | 40.00 | 46.51 |      43.75 |   36.59 |      48.00 | 44.83 |
|     176 |      13 |       15 |        13 |                7 |        12 |          12 |     16 |       7 |    14 |    20 |          7 |      15 |         12 |    13 |
|     382 |      23 |       25 |        36 |               13 |        27 |          31 |     26 |      12 |    35 |    43 |         16 |      41 |         25 |    29 |

Sunfall v0.6.1:

| overall | biology | business | chemistry | computer science | economics | engineering | health | history |  law  | math  | philosophy | physics | psychology | other |
| ------- | ------- | -------- | --------- | ---------------- | --------- | ----------- | ------ | ------- | ----- | ----- | ---------- | ------- | ---------- | ----- |
|   46.60 |   69.57 |    52.00 |     41.67 |            69.23 |     40.74 |       35.48 |  57.69 |   41.67 | 45.71 | 46.51 |      50.00 |   39.02 |      40.00 | 44.83 |
|     178 |      16 |       13 |        15 |                9 |        11 |          11 |     15 |       5 |    16 |    20 |          8 |      16 |         10 |    13 |
|     382 |      23 |       25 |        36 |               13 |        27 |          31 |     26 |      12 |    35 |    43 |         16 |      41 |         25 |    29 |

crestf411
/

L3.1-8B-sunfall-v0.6.1-dpo

Model tree for crestf411/L3.1-8B-sunfall-v0.6.1-dpo

Datasets used to train crestf411/L3.1-8B-sunfall-v0.6.1-dpo

Spaces using crestf411/L3.1-8B-sunfall-v0.6.1-dpo 2

Collection including crestf411/L3.1-8B-sunfall-v0.6.1-dpo

Sunfall