File size: 3,225 Bytes
cc5499a aae22d9 cc5499a 2dd7b06 cc5499a 2dd7b06 cc5499a 2dd7b06 cc5499a 2dd7b06 cc5499a 4b27f62 cc5499a 4b27f62 cc5499a 4b27f62 cc5499a 4b27f62 cc5499a 4b27f62 cc5499a 4b27f62 cc5499a 4b27f62 cc5499a aae22d9 cc5499a 4b27f62 cc5499a 4b27f62 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
---
license: other
tags:
- generated_from_trainer
- opt
- custom-license
- non-commercial
- email
- auto-complete
- 125m
datasets:
- aeslc
widget:
- text: 'Hey <NAME>,
Thank you for signing up for my weekly newsletter. Before we get started, you''ll
have to confirm your email address.'
example_title: newsletter
- text: 'Hi <NAME>,
I hope this email finds you well. Let me start by saying that I am a big fan of
your work.'
example_title: fan
- text: 'Greetings <NAME>,
I hope you had a splendid evening at the Company sausage eating festival. I am
reaching out because'
example_title: festival
- text: 'Good Morning <NAME>,
I was just thinking to myself about how much I love creating value'
example_title: value
- text: URGENT - I need
example_title: URGENT
parameters:
min_length: 4
max_length: 64
length_penalty: 0.7
no_repeat_ngram_size: 3
do_sample: false
num_beams: 4
early_stopping: true
repetition_penalty: 3.5
use_fast: false
base_model: facebook/opt-125m
---
> NOTE: there is currently a bug with huggingface API for OPT models. Please use the [colab notebook](https://colab.research.google.com/gist/pszemraj/033dc9a38da31ced7a0343091ba42e31/email-autocomplete-demo-125m.ipynb) to test :)
# opt for email generation - 125m
Why write the rest of your email when you can generate it?
```
from transformers import pipeline
model_tag = "pszemraj/opt-125m-email-generation"
generator = pipeline(
'text-generation',
model=model_tag,
use_fast=False,
do_sample=False,
)
prompt = """
Hello,
Following up on the bubblegum shipment."""
generator(
prompt,
max_length=96,
) # generate
```
- [colab notebook](https://colab.research.google.com/gist/pszemraj/033dc9a38da31ced7a0343091ba42e31/email-autocomplete-demo-125m.ipynb) for testing/use
## About
This model is a fine-tuned version of [facebook/opt-125m](https://huggingface.co/facebook/opt-125m) on an `aeslc` dataset.
- Emails, phone numbers, etc., were attempted to be excluded in a dataset preparation step using [clean-text](https://pypi.org/project/clean-text/) in Python.
- Note that API is restricted to generating 64 tokens - you can generate longer emails by using this in a text-generation `pipeline` object
It achieves the following results on the evaluation set:
- Loss: 2.5552
## Intended uses & limitations
- OPT models cannot be used commercially
- [here is a GitHub gist](https://gist.github.com/pszemraj/c1b0a76445418b6bbddd5f9633d1bb7f) for a script to generate emails in the console or to a text file.
## Training and evaluation data
- the `email_body` field of train + validation (get more data) from the [aeslc](https://huggingface.co/datasets/aeslc) dataset.
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 2.8245 | 1.0 | 129 | 2.8030 |
| 2.521 | 2.0 | 258 | 2.6343 |
| 2.2074 | 3.0 | 387 | 2.5595 |
| 2.0145 | 4.0 | 516 | 2.5552 |
### Framework versions
- Transformers 4.20.1
- Pytorch 1.11.0+cu113
- Tokenizers 0.12.1
|