royallab/MN-12B-Celeste-V1.9-exl2

Information

This is a Exl2 quantized version of MN-12B-Celeste-V1.9

Please refer to the original creator for more information.

Calibration dataset: Exl2 default

6bpw is recommended for the best quality to vram usage ratio (assuming you have enough vram).
Quants greater than 6bpw will not be created because there is no improvement in using them. If you really want them, ask someone else or make them yourself.

With async-hf-downloader: A lightweight and asynchronous huggingface downloader created by me

./async-hf-downloader royallab/MN-12B-Celeste-V1.9-exl2 -r 6bpw -p MN-12B-Celeste-V1.9-exl2-6bpw

With HuggingFace hub (pip install huggingface_hub)

huggingface-cli download royallab/MN-12B-Celeste-V1.9-exl2 --revision 6bpw --local-dir MN-12B-Celeste-V1.9-exl2-6bpw

TabbyAPI is a pure exllamav2 FastAPI server developed by us. You can find TabbyAPI's source code here: https://github.com/theroyallab/TabbyAPI

Inside TabbyAPI's config.yml, set model_name to MN-12B-Celeste-V1.9-exl2-6bpw
1. You can also use an argument --model_name MN-12B-Celeste-V1.9-exl2-6bpw on startup or you can use the /v1/model/load endpoint
Launch TabbyAPI inside your python env by running ./start.bat or ./start.sh

All my infrastructure and cloud expenses are paid out of pocket. If you'd like to donate, you can do so here: https://ko-fi.com/kingbri