# Installing

If running on Google Colab you will need a Colab Pro+ subscription.

Change the runtime type to a high memory or connect your local machine to colab, launch a vm perhaps.

First, you will need to obtain the LLaMA weights. 

You can sign up for the official weights here: https://huggingface.co/docs/transformers/main/model_doc/llama

There are alternative models available on huggingface however. This guide will assume you do not have access to the official weights.

If you do have access to the official weights skip to: Clone the delta weights

## Clone the LLaMA weights

In [None]:
# Setup git lfs
!git lfs install --skip-smudge --force
!git lfs env
!git config filter.lfs.process "git-lfs filter-process --skip"
!git config filter.lfs.smudge "git-lfs smudge --skip -- %f"

In [None]:
# Cloning the 7b parameter model repo
!git lfs clone https://huggingface.co/decapoda-research/llama-7b-hf

In [None]:
# Cloning the 13b parameter model repo
!git lfs clone https://huggingface.co/decapoda-research/llama-13b-hf

## Applying the Vicuna delta weights

### Install PyTorch with CUDA support

If you already have this installed in your environment, you can skip this step.

In [None]:
# First we need to upgrade setuptools, pip and wheel
!pip install --upgrade setuptools pip wheel

In [None]:
# For CUDA 11.X:
!pip install nvidia-cuda-runtime-cu11 --index-url https://pypi.ngc.nvidia.com

In [None]:
# For CUDA 12.x
!pip install nvidia-cuda-runtime-cu12 --index-url https://pypi.ngc.nvidia.com

In [None]:
# For PyTorch cu117
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

In [None]:
# For PyTorch cu118
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

### Running the Fast-Chat apply delta script

In [None]:
# Install FastChat
!pip install fschat

# Install the latest main branch of huggingface/transformers
!pip install git+https://github.com/huggingface/transformers

In [None]:
import json

def convert_llama_names(model_name: str) -> None:
 """Convert LlamaForCausalLM to LlamaForCausalLM and LLaMATokenizer to LlamaTokenizer"""
 with open(f"{model_name}/config.json", "r", encoding='utf-8') as f:
 data = f.read()

 config = json.loads(data)
 config["architectures"] = ["LlamaForCausalLM"]
 with open(f"{model_name}/config.json", "w", encoding='utf-8') as f:
 json.dump(config, f)


 with open(f"{model_name}/tokenizer_config.json", "r", encoding='utf-8') as f:
 data = f.read()

 config = json.loads(data)
 config["tokenizer_class"] = "LlamaTokenizer"

 with open(f"{model_name}/tokenizer_config.json", "w", encoding='utf-8') as f:
 json.dump(config, f)

In [None]:
!git lfs clone https://huggingface.co/lmsys/vicuna-7b-delta-v1.1

In [None]:
# 7b Model
convert_llama_names("llama-7b-hf")
!python -m fastchat.model.apply_delta --base llama-7b-hf --target vicuna-7b --delta ./vicuna-7b-delta-v1.1

In [None]:
!git lfs clone https://huggingface.co/lmsys/vicuna-13b-delta-v1.1

In [None]:
# 13b
convert_llama_names("llama-13b-hf")
!python -m fastchat.model.apply_delta --base llama-13b-hf --target vicuna-13b --delta ./vicuna-13b-delta-v1.1

# Installing Auto-Vicuna

Note that running this does not work in colab or the notebook, it is for demonstration purposes only.

In [None]:
!pip install auto-vicuna

# Running Auto-Vicuna

In [None]:
!auto_vicuna --vicuna_weights vicuna-7b

You can also create a .env file with 

```
VICUNA_WEIGHTS=vicuna-7b
```

To avoid passing the weights as an arugment.

## Known Issues

If your model keeps talking about random news articles and suchs the `special_tokens_map.json` and `tokenizer_config.json` need to have to stop tokens populated most likely, you can find them in the repo's root dir.