File size: 2,770 Bytes
520f0ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
language:
- en
tags:
- pytorch
- causal-lm
license: mit

---

# Lit-125M - A Small Fine-tuned Model For Fictional Storytelling

Lit-125M is a GPT-Neo 125M model fine-tuned on 2GB of a diverse range of light novels, erotica, and annotated literature for the purpose of generating novel-like fictional text. 

## Model Description

The model used for fine-tuning is [GPT-Neo 125M](https://huggingface.co/EleutherAI/gpt-neo-125M), which is a 125 million parameter auto-regressive language model trained on [The Pile](https://pile.eleuther.ai/)..

## Training Data & Annotative Prompting

The data used in fine-tuning has been gathered from various sources such as the [Gutenberg Project](https://www.gutenberg.org/). The annotated fiction dataset has prepended tags to assist in generating towards a particular style. Here is an example prompt that shows how to use the annotations.

```
[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror; Tags: 3rdperson, scary; Style: Dark ]
***
When a traveler in north central Massachusetts takes the wrong fork...
```

The annotations can be mixed and matched to help generate towards a specific style.

## Downstream Uses

This model can be used for entertainment purposes and as a creative writing assistant for fiction writers. The small size of the model can also help for easy debugging or further development of other models with a similar purpose.

## Example Code

```
from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained('hakurei/lit-125M')
tokenizer = AutoTokenizer.from_pretrained('hakurei/lit-125M')

prompt = '''[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror ]
***
When a traveler'''

input_ids = tokenizer.encode(prompt, return_tensors='pt')
output = model.generate(input_ids, do_sample=True, temperature=1.0, top_p=0.9, repetition_penalty=1.2, max_length=len(input_ids[0])+100, pad_token_id=tokenizer.eos_token_id)

generated_text = tokenizer.decode(output[0])
print(generated_text)
```

An example output from this code produces a result that will look similar to:

```
[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror ]
***
When a traveler takes a trip through the streets of the world, the traveler feels like a youkai with a whole world inside her mind. It can be very scary for a youkai. When someone goes in the opposite direction and knocks on your door, it is actually the first time you have ever come to investigate something like that.
That's right: everyone has heard stories about youkai, right? If you have heard them, you know what I'm talking about. 
It's hard not to say you
```

## Team members and Acknowledgements

- [Anthony Mercurio](https://github.com/harubaru)
- Imperishable_NEET