Edit model card

GPTQ-for-Bloom

Welcome

If you find this model helpful, please like this model and star us on https://github.com/LianjiaTech/BELLE !

Model description

8 bits quantization of BELLE-7B-2M and BELLE-7B-0.2M using GPTQ

GPTQ is SOTA one-shot weight quantization method.

The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/tree/main/gptq.

Basically, 8-bit quantization and 128 groupsize are recommended.

This code is based on GPTQ-for-LLaMa for Bloom model

Model list

model name file size GPU memory usage
base 27G ~28.2G
bloom7b-2m-8bit-128g.pt 9.7G ~11.4G
bloom7b-2m-4bit-128g.pt 6.9G ~8.4G
bloom7b-0.2m-8bit-128g.pt 9.7G ~11.4G
bloom7b-0.2m-4bit-128g.pt 6.9G ~8.4G

Limitations

There still exists a few issues in the model trained on current base model and data:

  1. The model might generate factual errors when asked to follow instructions related to facts.

  2. Occasionally generates harmful responses since the model still struggles to identify potential harmful instructions.

  3. Needs improvements on reasoning and coding.

Since the model still has its limitations, we require developers only use the open-sourced code, data, model and any other artifacts generated via this project for research purposes. Commercial use and other potential harmful use cases are not allowed.

Citation

Please cite us when using our code, data or model.

@misc{BELLE,
  author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li},
  title = {BELLE: Bloom-Enhanced Large Language model Engine },
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
}

Cite the original BLOOM, Stanford Alpaca and Self-Instruct papers as well!


GPTQ-for-Bloom

欢迎

如果您觉得此模型对您有帮助,请like此模型并在https://github.com/LianjiaTech/BELLE 项目中star我们!

模型描述

BELLE-7B-2M and BELLE-7B-0.2M进行8 bit(8位)量化。

GPTQ是目前SOTA的one-shot权重量化方法。

此模型的推理代码请见https://github.com/LianjiaTech/BELLE/tree/main/models/gptq .

一般来说,推荐使用8-bit量化及groupsize = 128.

Bloom模型使用GPTQ的推理代码基于GPTQ-for-LLaMa

模型列表

模型名称 文件大小 GPU显存占用
base 27G ~28.2G
bloom7b-2m-8bit-128g.pt 9.7G ~11.4G
bloom7b-2m-4bit-128g.pt 6.9G ~8.4G
bloom7b-0.2m-8bit-128g.pt 9.7G ~11.4G
bloom7b-0.2m-4bit-128g.pt 6.9G ~8.4G

局限性和使用限制

基于当前数据和基础模型训练得到的SFT模型,在效果上仍存在以下问题:

  1. 在涉及事实性的指令上可能会产生违背事实的错误回答。

  2. 对于具备危害性的指令无法很好的鉴别,由此会产生危害性言论。

  3. 在一些涉及推理、代码等场景下模型的能力仍有待提高。

基于以上模型局限性,我们要求开发者仅将我们开源的代码、数据、模型及后续用此项目生成的衍生物用于研究目的,不得用于商业,以及其他会对社会带来危害的用途。

引用

如果使用本项目的代码、数据或模型,请引用本项目。

@misc{BELLE,
  author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li},
  title = {BELLE: Bloom-Enhanced Large Language model Engine },
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
}

也请同时引用原始的BLOOM论文、Stanford Alpaca和Self-Instruct论文。

Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using BelleGroup/BELLE-7B-gptq 3