---
license: cc-by-4.0
library_name: transformers
tags:
- supertrainer2000
- not-for-all-audiences
- writing
- roleplay
datasets:
- euclaise/TinyCoT
- euclaise/mathoverflow-accepted
- euclaise/reddit-instruct-curated
- euclaise/WritingPrompts_curated
- sablo/oasst2_curated
- euclaise/mathqa_programs
- BEE-spoke-data/coedit-reworded-deduped
- pszemraj/booksum-short
- euclaise/reddit-instruct
- euclaise/SciCoT
- euirim/goodwiki
- neulab/conala
- squad
- ropes
- euclaise/logician
- chargoddard/rpguild
- lemonilia/LimaRP
base_model:
- euclaise/Memphis-CoT-3B
language:
- en
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64137e2150358a805203cbac/wEaKoLeJzidUdTWwQmA6k.png)

Memphis-scribe 3B is a finetune of [Memphis-CoT 3B](https://huggingface.co/euclaise/Memphis-CoT-3B) on more creative data, which itself is a finetune of [StableLM 3B 4e1t](https://huggingface.co/stabilityai/stablelm-3b-4e1t/).


It is trained further on TinyCoT, but also on
- 5000 comments from [reddit-instruct-curated](https://hf.co/euclaise/reddit-instruct-curated)
- 20000 comments from [writingprompts-curated](https://hf.co/euclaise/writingprompts-curated)
- 2000 examples of [converting MathQA problems to Python snippets](https://hf.co/euclaise/mathqa_programs)
- 2000 examples of [shorter booksum cases (both chapter->summary and summary->chapter tasks)](https://huggingface.co/datasets/pszemraj/booksum-short)
- 2000 examples from [mathoverflow-accepted](https://hf.co/euclaise/mathoverflow-accepted) comments with >10 upvotes
- 2000 examples from [coedit-reworded-deduped](https://huggingface.co/datasets/BEE-spoke-data/coedit-reworded-deduped)
- 500 examples from [SQuAD](https://huggingface.co/datasets/squad), for generating QA pairs given the context
- 500 examples from [ROPES](https://huggingface.co/datasets/ropes), for generating scenario+QA triplets given the context
- [conala](https://huggingface.co/datasets/neulab/conala)
- 500 examples from [logician](https://huggingface.co/datasets/euclaise/logician)
- 500 examples from [goodwiki](https://huggingface.co/datasets/euirim/goodwiki), for generating article given the title and description
- 2000 examples from [rpguild](https://huggingface.co/datasets/chargoddard/rpguild)
- [Curated subset of oasst2](https://huggingface.co/datasets/sablo/oasst2_curated)
- [LimaRP](https://huggingface.co/datasets/lemonilia/LimaRP)

## Training procedure

I started from [Memphis-CoT 3B](https://huggingface.co/euclaise/Memphis-CoT-3B), which used a novel iterative contrastive finetuning procedure to improve reasoning ability.


I first generated completions just as in each of the Memphis-CoT cycles.


Then, for each example in the dataset, I sampled a correct and incorrect completion. I applied the same ranking loss over these completions (with a weight of 0.2), but applied the cross-entropy loss over the example tokens instead of the completion tokens.

Finally, I averaged it with the Memphis-CoT model prior to the additional training, again with spherical linear interpolation, this time with a weight of 0.8.


## Prompt formats


```
### User:
[insert instruction here]
### Assistant:
[insert response here]
### User:
...
```

Alternatively:


```
### System:
[Insert system message here, focused on roleplay]
### User:
[insert instruction here]
### Assistant:
[insert response here]
### User:
...
```

## Benchmarks

This model performs significantly worse than Memphis-CoT on benchmarks, despite being better suited to chat and creative writing tasks. This is an expected tradeoff, especially for small models.


| Model                                                                      | GSM8K (5-shot) | AGIEval (English/Nous subset, acc_norm) | BIG Bench Hard (CoT, few-shot*) |
|:---------------------------------------------------------------------------|:---------------|:----------------------------------------|:--------------------------------|
| [StableLM 3B Base](https://hf.co/stabilityai/stablelm-3b-4e1t)             | 2.05%          | 25.14%                                  | 36.75%                          |
| [Memphis-CoT 3B](https://hf.co/euclaise/Memphis-CoT-3B)                    | 18.8%           | 27.22%                                  | 36.92%                          |
| [Memphis-scribe 3B](https://hf.co/euclaise/Memphis-scribe-3B)              | 9.55%          | 24.78%                                  |                                 |
*5-shot, as performed automatically by LM Evaluation Harness bbh_cot_fewshot even with num_fewshot=0