Justcode commited on
Commit
4357fd0
1 Parent(s): 6bdb8a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -8
README.md CHANGED
@@ -17,16 +17,18 @@ metrics:
17
 
18
  licence: apache-2.0
19
  ---
20
- # T5 for Chinese Question Answering
21
- Randeng-T5-784M-QA-Chinese
 
 
22
 
23
 
24
- ## Brief Introduction
25
  This T5-Large model, is the first pretrained generative question answering model for Chinese in huggingface. It was pretrained on the Wudao 180G corpus, and finetuned on Chinese SQuAD and CMRC2018 dataset. It can produce a fluent and accurate answer given a passage and question.
26
 
27
- 这是huggingface上首个中文的生成式问答模型。它基于T5-Large结构,使用悟道180G语料进行预训练,在翻译的中文SQuAD和CMRC2018两个阅读理解数据集上进行微调。输入一篇文章和一个问题,可以生成准确流畅的回答。
28
 
29
- ## Performance
30
 
31
  CMRC 2018 dev (Original span prediction task, we cast it as a generative QA task)
32
 
@@ -43,7 +45,7 @@ This T5-Large model, is the first pretrained generative question answering model
43
  我们的模型有着极高的生成质量和准确率,76%的回答包含了正确答案(Contain Answer Rate),和当前最好模型MacBERT-Large想媲美,它70%的起始位置预测和答案精确匹配(EM)。我们的模型EM值较低,因为生成的大部分为完整的句子,而标准答案通常是句子片段。
44
  P.S. SOTA模型只需预测起始和结束位置,这种抽取式阅读理解任务比生成式的简单很多。
45
 
46
- ## Cases
47
 
48
  Here are random picked samples:
49
  <img src="https://huggingface.co/IDEA-CCNL/Randeng-T5-784M-QA-Chinese/resolve/main/cases_t5_cmrc.png" div align=middle />
@@ -52,7 +54,7 @@ Here are random picked samples:
52
 
53
  If the picture fails to display, you can find the picture in Files and versions.
54
 
55
- ## Usage
56
  ```python
57
  import numpy as np
58
  from transformers import T5Tokenizer,MT5ForConditionalGeneration
@@ -77,7 +79,7 @@ tokenizer.batch_decode(pred_ids, skip_special_tokens=True, clean_up_tokenization
77
  ```
78
 
79
 
80
- # Citation
81
 
82
  You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
83
 
 
17
 
18
  licence: apache-2.0
19
  ---
20
+ # Randeng-T5-784M-QA-Chinese
21
+ T5 for Chinese Question Answering
22
+ - Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
23
+ - Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/)
24
 
25
 
26
+ ## 简介 Brief Introduction
27
  This T5-Large model, is the first pretrained generative question answering model for Chinese in huggingface. It was pretrained on the Wudao 180G corpus, and finetuned on Chinese SQuAD and CMRC2018 dataset. It can produce a fluent and accurate answer given a passage and question.
28
 
29
+ 这是huggingface上首个中文的生成式问答模型。它基于T5-Large结构,使用悟道180G语料在[封神框架](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen)进行预训练,在翻译的中文SQuAD和CMRC2018两个阅读理解数据集上进行微调。输入一篇文章和一个问题,可以生成准确流畅的回答。
30
 
31
+ ## 模型表现 Performance
32
 
33
  CMRC 2018 dev (Original span prediction task, we cast it as a generative QA task)
34
 
 
45
  我们的模型有着极高的生成质量和准确率,76%的回答包含了正确答案(Contain Answer Rate),和当前最好模型MacBERT-Large想媲美,它70%的起始位置预测和答案精确匹配(EM)。我们的模型EM值较低,因为生成的大部分为完整的句子,而标准答案通常是句子片段。
46
  P.S. SOTA模型只需预测起始和结束位置,这种抽取式阅读理解任务比生成式的简单很多。
47
 
48
+ ## 样例 Cases
49
 
50
  Here are random picked samples:
51
  <img src="https://huggingface.co/IDEA-CCNL/Randeng-T5-784M-QA-Chinese/resolve/main/cases_t5_cmrc.png" div align=middle />
 
54
 
55
  If the picture fails to display, you can find the picture in Files and versions.
56
 
57
+ ## 使用 Usage
58
  ```python
59
  import numpy as np
60
  from transformers import T5Tokenizer,MT5ForConditionalGeneration
 
79
  ```
80
 
81
 
82
+ # 引用 Citation
83
 
84
  You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
85