shibing624
/

chatglm-6b-csc-zh-lora

PyTorch

Chinese

chatglm

Text2Text-Generation

Model card Files Files and versions Community

shibing624 commited on Apr 6, 2023

Commit

3f167e2

•

1 Parent(s): 456d588

Update README.md

Browse files

Files changed (1) hide show

README.md +109 -1

README.md CHANGED Viewed

@@ -1,3 +1,111 @@
 ---
-license: apache-2.0
 ---

 ---
+language:
+- zh
+tags:
+- chatglm
+- pytorch
+- zh
+- Text2Text-Generation
+license: "apache-2.0"
+widget:
+- text: "对下面中文拼写纠错：\n少先队员因该为老人让坐。\n答："
 ---
+# Chinese Spelling Correction LoRA Model
+ChatGLM中文纠错LoRA模型
+`chatglm-6b-csc-zh-lora` evaluate test data：
+The overall performance of chatglm-6b-csc-zh-lora on CSC **test**:
+|prefix|input_text|target_text|pred|
+|:-- |:--- |:--- |:-- |
+|对下面中文拼写纠错：|少先队员因该为老人让坐。|少先队员应该为老人让座。|少先队员应该为老人让座。\n错误字：因，坐|
+在CSC测试集上生成结果纠错准确率高，由于是基于大模型，结果常常能带给人惊喜，不仅能纠错，还带有句子润色和改写功能。
+## Usage
+本项目开源在lmft项目：[textgen](https://github.com/shibing624/lmft)，可支持ChatGLM模型，通过如下命令调用：
+Install package:
+```shell
+pip install -U lmft
+```
+```python
+from lmft import ChatGlmModel
+model = ChatGlmModel("chatglm", "THUDM/chatglm-6b", lora_name="shibing624/chatglm-6b-csc-zh-lora")
+r = model.predict(["对下面中文拼写纠错：\n少先队员因该为老人让坐。\n答："])
+print(r) # ['少先队员应该为老人让座。\n错误字：因，坐']
+```
+## Usage (HuggingFace Transformers)
+Without [lmft](https://github.com/shibing624/lmft), you can use the model like this:
+First, you pass your input through the transformer model, then you get the generated sentence.
+Install package:
+```
+pip install transformers
+```
+```python
+import sys
+from peft import PeftModel
+from transformers import AutoModel, AutoTokenizer
+sys.path.append('..')
+model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, device_map='auto')
+model = PeftModel.from_pretrained(model, "shibing624/chatglm-6b-csc-zh-lora")
+model = model.half().cuda()  # fp16
+tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
+sents = ['对下面中文拼写纠错：\n少先队员因该为老人让坐。\n答：',
+         '对下面中文拼写纠错：\n下个星期，我跟我朋唷打算去法国玩儿。\n答：']
+for s in sents:
+    response = model.chat(tokenizer, s, max_length=128, eos_token_id=tokenizer.eos_token_id)
+    print(response)
+```
+output:
+```shell
+('少先队员应该为老人让座。\n错误字：因，坐', [('对下面中文拼写纠错：\n少先队员因该为老人让坐。\n答：', '少先队员应该为老人让座。\n错误字：因，坐')])
+('下个星期，我跟我朋友打算去法国玩儿。\n错误字：唷', [('对下面中文拼写纠错：\n下个星期，我跟我朋唷打算去法国玩儿。\n答：', '下个星期，我跟我朋友打算去法国玩儿。\n错误字：唷')])
+```
+模型文件组成：
+```
+chatglm-6b-csc-zh-lora
+    ├── adapter_config.json
+    └── adapter_model.bin
+```
+### 训练数据集
+#### 中文纠错数据集
+- 数据：[shibing624/CSC](https://huggingface.co/datasets/shibing624/CSC)
+如果需要训练ChatGLM模型，请参考[https://github.com/shibing624/lmft](https://github.com/shibing624/lmft)
+## Citation
+```latex
+@software{lmft,
+  author = {Xu Ming},
+  title = {lmft: Implementation of language model finetune},
+  year = {2023},
+  url = {https://github.com/shibing624/lmft},
+}
+```