3rd-Degree-Burn's picture
Update README.md
c6c45e0 verified
metadata
license: apache-2.0
tags:
  - merge
  - mergekit
  - lazymergekit
  - 3rd-Degree-Burn/L-3.1-Science-Writer-8B-iter1
  - 3rd-Degree-Burn/L-3.1-Science-Writer-8B-iter2
  - djuna/L3.1-Purosani-2-8B
  - THUDM/LongWriter-llama3.1-8b
datasets:
  - neuralwork/arxiver
language:
  - en
base_model:
  - THUDM/LongWriter-llama3.1-8b
  - djuna/L3.1-Purosani-2-8B
pipeline_tag: text-generation

L-3.1-Science-Writer-8B

Work in progress

This is a model I made by fine-tuning THUDM/LongWriter-llama3.1-8b on the arxiver dataset for 2 epochs. Then merged it with djuna/L3.1-Purosani-2-8B for general smarts and all-rounderishness.

image/png

Chat format

Use the same format as LongWriter:

<<SYS>>
You are a research assistant.
<</SYS>>

[INST]

[/INST]

To make it write a paper (this isn't as good as I expected it to be):

<<SYS>>
You are a research assistant.
<</SYS>>

[INST]
Write a paper with the given details provided by the user. Identify gaps or opportunities for original insights based on the provided abstract. Include clear proofs, calculations, or evidence where required. Maintain an academic tone and ensure consistency.
Topic: {}
Abstract (optional): {}
Include these authors' names: {}.
[/INST]

Benchmarks

T Model Average IFEval BBH MATH Lvl 5 GPQA MUSR MMLU-PRO CO₂ cost (kg)
🔶 3rd-Degree-Burn/L-3.1-Science-Writer-8B 21.08 42.63 29.2 10.27 3.24 11.69 29.44 0.71

Personal thoughts

I used a pretty low rank (r=32). The final loss after 2 epochs was around 0.9, which is okay but not great. I think the deeper layers of the model haven’t been fully saturated yet, so it’s still a bit of a work in progress.

Edit: This model has a repetition problem. I wouldn't recommend using it.