shibing624
commited on
Commit
•
797f040
1
Parent(s):
347ebbe
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,108 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- zh
|
4 |
+
tags:
|
5 |
+
- SongNet
|
6 |
+
- pytorch
|
7 |
+
- zh
|
8 |
+
- Text2Text-Generation
|
9 |
+
license: "apache-2.0"
|
10 |
+
widget:
|
11 |
+
- text: "张抡<s1>春光好<s2>烟澹澹,雨。</s>水溶溶。</s>帖水落花飞不起,小桥东。</s>翩翩怨蝶愁蜂。</s>绕芳丛。</s>恋馀红。</s>不恨无情桥下水,恨东风。"
|
12 |
+
|
13 |
---
|
14 |
+
|
15 |
+
# SongNet for Chinese songci(songnet-base-chinese-songci) Model
|
16 |
+
SongNet中文宋词生成模型
|
17 |
+
|
18 |
+
`songnet-base-chinese-songci` evaluate couplet test data:
|
19 |
+
|
20 |
+
The overall performance of SongNet on couplet **test**:
|
21 |
+
|
22 |
+
|input_text|pred|
|
23 |
+
|:--- |:--- |
|
24 |
+
|春回大地,对对黄莺鸣暖树|日照神州,群群紫燕衔新泥|福至人间,家家紫燕舞和风|
|
25 |
+
|
26 |
+
在宋词测试集上生成结果满足字数相同、词性对齐、词面对齐、形似要求,针对性的SongNet网络结构,在语义对仗工整和平仄合律上的效果明显优于T5和GPT2等模型。
|
27 |
+
|
28 |
+
SongNet的网络结构:
|
29 |
+
|
30 |
+
![arch](songnet-network.png)
|
31 |
+
|
32 |
+
## Usage
|
33 |
+
|
34 |
+
本项目开源在文本生成项目:[textgen](https://github.com/shibing624/textgen),可支持SongNet模型,通过如下命令调用:
|
35 |
+
|
36 |
+
Install package:
|
37 |
+
```shell
|
38 |
+
pip install -U textgen
|
39 |
+
```
|
40 |
+
|
41 |
+
```python
|
42 |
+
from textgen.language_modeling import SongNetModel
|
43 |
+
|
44 |
+
|
45 |
+
model = SongNetModel(model_type='songnet', model_name='songnet-base-chinese-songci')
|
46 |
+
sentences = [
|
47 |
+
"严蕊<s1>如梦令<s2>道是梨花不是。</s>道是杏花不是。</s>白白与红红,别是东风情味。</s>曾记。</s>曾记。</s>人在武陵微醉。",
|
48 |
+
"张抡<s1>春光好<s2>烟澹澹,雨。</s>水溶溶。</s>帖水落花飞不起,小桥东。</s>翩翩怨蝶愁蜂。</s>绕芳丛。</s>恋馀红。</s>不恨无情桥下水,恨东风。"
|
49 |
+
]
|
50 |
+
print("inputs:", sentences)
|
51 |
+
print("outputs:", model.generate(sentences))
|
52 |
+
sentences = [
|
53 |
+
"秦湛<s1>卜算子<s2>_____,____到。_______,____俏。_____,____报。_______,____笑。",
|
54 |
+
"秦湛<s1>卜算子<s2>_雨___,____到。______冰,____俏。____春,__春_报。__山花___,____笑。"
|
55 |
+
]
|
56 |
+
print("inputs:", sentences)
|
57 |
+
print("outputs:", model.fill_mask(sentences))
|
58 |
+
```
|
59 |
+
|
60 |
+
output:
|
61 |
+
```shell
|
62 |
+
|
63 |
+
inputs: ['严蕊<s1>如梦令<s2>道是梨花不是。</s>道是杏花不是。</s>白白与红红,别是东风情味。</s>曾记。</s>曾记。</s>人在武陵微醉。', '张抡<s1>春光好<s2>烟澹澹,雨。</s>水溶溶。</s>帖水落花飞不起,小桥东。</s>翩翩怨蝶愁蜂。</s>绕芳丛。</s>恋馀红。</s>不恨无情桥下水,恨东风。']
|
64 |
+
outputs: ['<bos>风撼梧桐影乱。</s>雨洒梧桐影乱。</s>又是一番红,人与暮霞俱远。</s>凄断。</s>凄断。</s>人与暮霞俱远。</s>', '<bos>光阴速,还。</s>转飞残。</s>日向旧时檐下见,两三竿。</s>多少社寒垂涎。</s>玉人间。</s>恶循环。</s>不见旧时檐下见,两三竿。</s>']
|
65 |
+
inputs: ['秦湛<s1>卜算子<s2>_____,____到。_______,____俏。_____,____报。_______,____笑。', '秦湛<s1>卜算子<s2>_雨___,____到。______冰,____俏。____春,__春_报。__山花___,____笑。']
|
66 |
+
outputs: ['<bos>新月破寒影,正柳暗清到。千缕万绪浓於雨,多少匆匆俏。梦魂又不得,那堪断得报。听著窗前柳弄歌,寂寞梨花笑。</s>', '<bos>风雨送春归,草软莺簧到。门对宝篆淡淡冰,翠点吴绫俏。小立东风春,不怕春归报。多少山花妒落红,背面一饷笑。</s>']
|
67 |
+
```
|
68 |
+
|
69 |
+
模型文件组成:
|
70 |
+
```
|
71 |
+
t5-chinese-couplet
|
72 |
+
├── pytorch_model.bin
|
73 |
+
└── vocab.txt
|
74 |
+
```
|
75 |
+
|
76 |
+
|
77 |
+
### 训练数据集
|
78 |
+
#### 中文宋词数据集
|
79 |
+
|
80 |
+
- 数据:[songci](https://github.com/lipiji/SongNet/blob/master/data/ci.txt)
|
81 |
+
- 相关内容
|
82 |
+
- [Huggingface](https://huggingface.co/)
|
83 |
+
- [SongNet paper](https://aclanthology.org/2020.acl-main.68/)
|
84 |
+
- [textgen](https://github.com/shibing624/textgen)
|
85 |
+
|
86 |
+
|
87 |
+
数据格式:
|
88 |
+
|
89 |
+
```text
|
90 |
+
head -n 2 ci.txt
|
91 |
+
赵必<s1>水调歌头<s2>百岁人能几,七十世间稀。</s>何况先生八十,蔗境美如饴。</s>好与七松处士,更与梅花君子,永结岁寒知。</s>菊节先五日,满酌紫霞卮。</s>美成词,山谷字,老坡诗。</s>三径田园如昨,久矣赋归辞。</s>不是商山四皓,便是香山九老,红颊白须眉。</s>九十尚入相,绿竹颂猗猗。
|
92 |
+
李曾伯<s1>水调歌头<s2>千一载英杰,百二国山河。</s>提封几半宇宙,万里仗天戈。</s>十乘晋军旗鼓,三岁秦关扃锁,地利属人和。</s>位次功第一,未数侯何。</s>建青油,持柴荷,听黄麻。</s>乾坤整顿都了,玉殿侍羲娥。</s>且醉东湖花柳,却泛西湖舟楫,留不住岷峨。</s>谁为语儒馆,浓墨被诗歌。
|
93 |
+
```
|
94 |
+
|
95 |
+
|
96 |
+
如果需要训练SongNet模型,请参考[https://github.com/shibing624/textgen/blob/main/examples/language_generation/training_zh_songnet_demo.py](https://github.com/shibing624/textgen/blob/main/examples/language_generation/training_zh_songnet_demo.py)
|
97 |
+
|
98 |
+
|
99 |
+
## Citation
|
100 |
+
|
101 |
+
```latex
|
102 |
+
@software{textgen,
|
103 |
+
author = {Xu Ming},
|
104 |
+
title = {textgen: Implementation of Text Generation models},
|
105 |
+
year = {2022},
|
106 |
+
url = {https://github.com/shibing624/textgen},
|
107 |
+
}
|
108 |
+
```
|