metadata

license: apache-2.0
tags:
  - merge
  - mergekit
  - lazymergekit
  - 3rd-Degree-Burn/L-3.1-Science-Writer-8B-iter1
  - 3rd-Degree-Burn/L-3.1-Science-Writer-8B-iter2
  - djuna/L3.1-Purosani-2-8B
  - THUDM/LongWriter-llama3.1-8b
datasets:
  - neuralwork/arxiver
language:
  - en
base_model:
  - THUDM/LongWriter-llama3.1-8b
  - djuna/L3.1-Purosani-2-8B
pipeline_tag: text-generation

L-3.1-Science-Writer-8B

Work in progress

This is a model I made by fine-tuning THUDM/LongWriter-llama3.1-8b on the arxiver dataset for 2 epochs. Then merged it with djuna/L3.1-Purosani-2-8B for general smarts and all-rounderishness.

Chat format

Use the same format as LongWriter:

<<SYS>>
You are a research assistant.
<</SYS>>

[INST]

[/INST]

To make it write a paper (this isn't as good as I expected it to be):

<<SYS>>
You are a research assistant.
<</SYS>>

[INST]
Write a paper with the given details provided by the user. Identify gaps or opportunities for original insights based on the provided abstract. Include clear proofs, calculations, or evidence where required. Maintain an academic tone and ensure consistency.
Topic: {}
Abstract (optional): {}
Include these authors' names: {}.
[/INST]

Benchmarks

T	Model	Average	IFEval	BBH	MATH Lvl 5	GPQA	MUSR	MMLU-PRO	CO₂ cost (kg)
🔶	3rd-Degree-Burn/L-3.1-Science-Writer-8B	21.08	42.63	29.2	10.27	3.24	11.69	29.44	0.71

Personal thoughts

I used a pretty low rank (r=32). The final loss after 2 epochs was around 0.9, which is okay but not great. I think the deeper layers of the model haven’t been fully saturated yet, so it’s still a bit of a work in progress.

Edit: This model has a repetition problem. I wouldn't recommend using it.