Fast Mamba

#34
by Praneethkeerthi - opened

ValueError: Fast Mamba kernels are not available. Make sure to they are installed and that the mamba module is on a CUDA device. How to fix this

I have the same issues, despite installing mamba-ssm, both with pip and from git, and causal-conv1d I get this warning:
04/06/2024 10:45:22 - WARNING - transformers_modules.ai21labs.Jamba-v0.1.8ee14c3ece13be2d26f81fd42f5c29b89a84d846.modeling_jamba - T
he fast path is not available because on of (selective_state_update, selective_scan_fn, causal_conv1d_fn, causal_conv1d_update, mam ba_inner_fn) is None. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal
-conv1d. If you want to use the naive implementation, set use_mamba_kernels=False in the model config
which then gives the value error above once the training loop starts.
Is it an issue with torch 2.2?

If you use device_map="auto" then the Mamba layer may be loaded in the CPU to fit the VRAM, you cant use fast kernels on the CPU, it only works for CUDA

It took me a week to fix this error, but then I discovered that the causal-conv1d library has a kernel that only runs on linux, I tried installing it with ubuntu and wsl2 and the results were easy.

I also fixed the problem in my environment. I installed 'accelerate' package (pip install accelerate) and set the parameter 'device_map' of AutoModelForCausalLM.from_pretrained to "cuda" (device_map="cuda"). In general I use torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118

This is the code block directly responsible for the error, and this is the sanity check triggered when modeling_jamba.py is loaded.

To address the issue,

  1. Make sure mamba-ssm and causal-conv1d are installed
    • Installation
      $ pip install mamba-ssm causal-conv1d
      
    • Smoke test
      # Should not raise ImportError
      from mamba_ssm.ops.selective_scan_interface import mamba_inner_fn, selective_scan_fn
      from mamba_ssm.ops.triton.selective_state_update import selective_state_update
      
    • Restart the Python kernel if you have loaded the model already
  2. Make sure the model is attached to a CUDA device
    # Assumes single-device training
    # https://stackoverflow.com/a/58926343/13301046
    next(model.parameters()).device
    # -> device(type='cuda', index=0)
    

Also see the documentation.

Sign up or log in to comment