akashicmarga/Mistral-7B-Instruct-v0.1-q4f16_1-metal

The model in this repository utilizes Mistral-7B-Instruct-v0.1 (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), the mlc-llm (https://llm.mlc.ai/docs/) Metal version with 4-bit quantization and an embedding layer for MLC embedding. You have the option to use the FastAPI server instead of OpenAI to run the model locally. For using in langchain, please refer to the sample_langchain.py file in the following GitHub link: https://github.com/mlc-ai/mlc-llm/blob/main/examples/rest/python/sample_langchain.py.

Environment setup

conda create -n mlc-chat-venv -c mlc-ai -c conda-forge mlc-chat-cli-nightly

conda activate mlc-chat-venv

Fast API Server

python -m mlc_chat.rest --model Mistral-7B-Instruct-v0.1-q4f16_1/ --lib-path Mistral-7B-Instruct-v0.1-q4f16_1/Mistral-7B-Instruct-v0.1-q4f16_1-metal.so