File size: 5,864 Bytes
6420c0c
 
 
 
 
 
 
 
 
 
 
4bca873
6420c0c
eba88e7
fb56edf
 
 
 
 
 
 
 
eba88e7
 
6420c0c
4bca873
6420c0c
 
 
 
 
c7253d5
 
6211caa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6420c0c
 
 
 
 
 
 
 
 
 
 
 
ea90abb
6420c0c
 
 
 
 
 
 
 
 
4bca873
 
6420c0c
 
 
 
 
 
 
 
 
 
 
 
 
58d555c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
license: cc-by-nc-4.0
inference:
  parameters:
    num_beams: 2
    num_beam_groups: 2
    num_return_sequences: 1
    repetition_penalty: 1
    diversity_penalty: 2.01
    no_repeat_ngram_size: 2
    temperature: 0.8
    max_length: 256
widget:
- text: >-
    description: Men Business Wrist Watch Quartz Casual Belt Men's Watch Brown Watch
  example_title: Example 1
- text: >-
    description: SAMSUNG 16" Galaxy Book4 Pro Laptop PC Computer, Intel Core 7
  example_title: Example 2
- text: >-
    description: Basics 18-Piece Kitchen Dinnerware Set, Plates, Dishes, Bowls - White
  example_title: Example 3
datasets:
- Ateeqq/Amazon-Product-Description
---
# Product Description Generator

## Overview

This repository contains a fine-tuned model for generating high-quality product descriptions. The model is based on the `t5-base` and has 223 million parameters. It has been fine-tuned on the Amazon Product Dataset, which contains 10 million examples, with the cleaned version having 0.5 million examples. This is a test version trained on 0.1 million examples, and it will be updated to the latest 0.5 million cleaned examples soon.

Developed by team at https://exnrt.com

## Usage

T5 model expects a task related prefix: since it is a description generation task, we will add a prefix "description: "

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Ateeqq/product-description-generator", token="your_token")
model = AutoModelForSeq2SeqLM.from_pretrained("Ateeqq/product-description-generator", token="your_token")

input_text = "18-Piece Kitchen Dinnerware white Set, Plates, Dishes, Bowls"

inputs = tokenizer.encode("description: " + input_text, return_tensors="pt", max_length=128, truncation=True)

outputs = model.generate(inputs, max_length=400, num_beams=2, num_beam_groups=2, num_return_sequences=2, repetition_penalty=1.0, diversity_penalty=3.0, no_repeat_ngram_size=2, temperature=0.9, early_stopping=True)

description = tokenizer.decode(outputs[1], skip_special_tokens=True)

print(description)
```

## Getting Multiple Outputs

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

device = "cuda"
tokenizer = AutoTokenizer.from_pretrained("Ateeqq/product-description-generator", token='your_token')
model = AutoModelForSeq2SeqLM.from_pretrained("Ateeqq/product-description-generator", token='your_token').to(device)

def generate_description(title):
    input_ids = tokenizer(f'description: {title}', return_tensors="pt", padding="longest", truncation=True, max_length=128).input_ids.to(device)
    outputs = model.generate(
        input_ids,
        num_beams=5,
        num_beam_groups=5,
        num_return_sequences=5,
        repetition_penalty=10.0,
        diversity_penalty=3.0,
        no_repeat_ngram_size=2,
        temperature=0.7,
        max_length=128
    )
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)

title = '18-Piece Kitchen Dinnerware white Set, Plates, Dishes, Bowls'
generate_description(title)
```
### Outputs:
```
['18-Piece Kitchen Dinnerware White Set, Plates, Dishes and Bowls. This set includes a large bowl with two plates for serving dinner or dessert dishes in the kitchen. The white plate is made of durable stainless steel that will not fade over time.',
 'The 18 Piece Kitchen Dinnerware Set features a white plate, dish and bowl. This set is made of durable stainless steel with an elegant design that will add elegance to your kitchen.',
 'This 18-piece dinnerware set is made of durable stainless steel. It features a white finish and comes with an easy to clean handle for ease of cleaning. The bowls are dishwasher safe, microwave safe or can be used as tablecloths in the kitchen.',
 'Kitchen Dinnerware is a great addition to your kitchen. This 18-piece set includes four plates, two dishes and three bowls for serving food or beverages in the dining room with an elegant design that will add sophistication to any tabletop setting.',
 "p>This 18-piece dinnerware set is made of high quality stainless steel. The white plate and dish are dishwasher safe, easy to clean with the included lids for ease of use. This dining table set features an elegant design that will add elegance style in your kitchen or living room. It's also perfect for serving food on any occasion like birthday parties, house warming ceremonies, Thanksgiving celebrations etc."]
```

## Features

- **Architecture**: t5-base (223M parameters)
- **Dataset**: Amazon Product Dataset
  - **Original**: 10 million examples
  - **Cleaned**: 0.5 million examples
- **Training**: 
  - **Current Version**: Trained on 0.1 million cleaned examples
  - **Upcoming Update**: Will be trained on 0.5 million cleaned examples
- **Training Time**: 
  - **Hardware**: Colab T4 GPU
  - **Speed**: 4.91 iterations/second
  - **Training Time**: 5:53:49
  - **Metrics**:
    - **Loss**: 2.53
    - **Training Loss Step**: 0.095
    - **Validation Loss Step**: 1.670
    - **Validation Loss Epoch**: 2.290
    - **Training Loss Epoch**: 1.410

## Data Preparation

- **Training Data**: First 100,000 examples from `train`
- **Evaluation Data**: First 10,000 examples from `test`
- **Source Max Token Length**: 50
- **Target Max Token Length**: 300
- **Batch Size**: 1
- **Max Epochs**: 1

## Future Work

- **Update Training Data**: Retrain the model using the latest 0.5 million cleaned examples.
- **Optimize Training Parameters**: Experiment with different batch sizes, learning rates, and epochs to further improve model performance.
- **Expand Dataset**: Incorporate more diverse product datasets to enhance the model's versatility and robustness.

## License

Limited Use: It grants a non-exclusive, non-transferable license to use the this model. This means you can't freely share it with others or sell the model itself.