yanghaojin (Haojin Yang)

posted an update 4 months ago

Post

892

Dear community,

Please check our recent blog post, "GPU Poor Savior: Revolutionizing Low-Bit Open Source LLMs and Cost-Effective Edge Computing". A cheaper and more efficient SFT scheme for quantized LLMs is provided.

https://huggingface.co/blog/NicoNico/green-bit-llm

replied to their post 5 months ago

Command for reproducing this run 😉 :
CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=true python -m sft.finetune --model GreenBitAI/Llama-3-8B-layer-mix-bpw-2.2 --tune-qweight-only --galore --galore-rank 64 --optimizer adamw8bit --batch-size 1 --seqlen 96

posted an update 5 months ago

Post

2020

Full parameter fine-tuning of the LLaMA-3 8B model using a single GTX 3090 GPU with 24GB of graphics memory?

Please check out our tool for fine-tuning, inferencing, and evaluating GreenBitAI's low-bit LLMs:
https://github.com/GreenBitAI/green-bit-llm
Model Zoo:
https://huggingface.co/GreenBitAI

3 replies

·

posted an update 5 months ago

Post

1346

Dear all,

We are happy to share that we have just open-sourced over 200 low-bit LLMs. For the MLX community, we have prepared 2-4 bit versions of mainstream LLMs. You can visit the following collection to access them: GreenBitAI/greenbitai-mlx-llm-6614eb6ceb8da657c2b4ed58.

These low-bit models can be conveniently used through our open-source tool at https://github.com/GreenBitAI/gbx-lm.

Compared to other open-source quantization algorithms, these models provide better accuracy retention. We have provided some model evaluation results here:
https://github.com/GreenBitAI/green-bit-llm/blob/main/green_bit_llm/evaluation/README.md.

You can also evaluate the models yourself using the evaluation script we provided.

1 reply

·

Haojin Yang

AI & ML interests

Organizations

yanghaojin's activity