|
--- |
|
datasets: |
|
- karpathy/tiny_shakespeare |
|
library_name: tf-keras |
|
license: mit |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-generation |
|
tags: |
|
- lstm |
|
--- |
|
|
|
## Model description |
|
|
|
LSTM trained on Andrej Karpathy's [`tiny_shakespeare`](https://huggingface.co/datasets/karpathy/tiny_shakespeare) dataset, from his blog post, [The Unreasonable Effectiveness of Recurrent Neural Networks](https://karpathy.github.io/2015/05/21/rnn-effectiveness/). |
|
|
|
Made to experiment with Hugging Face and W&B. |
|
|
|
## Intended uses & limitations |
|
|
|
The model predicts the next character based on a variable-length input sequence. After `18` epochs of training, the model is generating text that is somewhat coherent. |
|
|
|
```py |
|
def generate_text(model, encoder, text, n): |
|
vocab = encoder.get_vocabulary() |
|
generated_text = text |
|
for _ in range(n): |
|
encoded = encoder([generated_text]) |
|
pred = model.predict(encoded, verbose=0) |
|
pred = tf.squeeze(tf.argmax(pred, axis=-1)).numpy() |
|
generated_text += vocab[pred] |
|
return generated_text |
|
|
|
sample = "M" |
|
print(generate_text(model, encoder, sample, 100)) |
|
``` |
|
|
|
``` |
|
MQLUS: |
|
I will be so that the street of the state, |
|
And then the street of the street of the state, |
|
And |
|
``` |
|
|
|
## Training and evaluation data |
|
|
|
[![https://example.com](https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg)](https://wandb.ai/adamelliotfields/shakespeare) |
|
|
|
## Training procedure |
|
|
|
The dataset consists of various works of William Shakespeare concatenated into a single file. The resulting file consists of individual speeches separated by `\n\n`. |
|
|
|
The tokenizer is a Keras `TextVectorization` preprocessor that uses a simple character-based vocabulary. |
|
|
|
To construct the training set, `100` characters are taken with the next character used as the target. This is repeated for each character in the text and results in **1,115,294** shuffled training examples. |
|
|
|
*TODO: upload encoder* |
|
|
|
### Training hyperparameters |
|
|
|
| Hyperparameters | Value | |
|
| :---------------- | :-------- | |
|
| `epochs` | `18` | |
|
| `batch_size` | `1024` | |
|
| `optimizer` | `AdamW` | |
|
| `weight_decay` | `0.001` | |
|
| `learning_rate` | `0.00025` | |
|
|
|
## Model Plot |
|
|
|
<details> |
|
<summary>View Model Plot</summary> |
|
|
|
![Model Image](./model.png) |
|
|
|
</details> |