Using TEI locally with GPU
You can install text-embeddings-inference
locally to run it on your own machine with a GPU.
To make sure that your hardware is supported, check out the Supported models and hardware page.
Step 1: CUDA and NVIDIA drivers
Make sure you have CUDA and the NVIDIA drivers installed - NVIDIA drivers on your device need to be compatible with CUDA version 12.2 or higher.
Add the NVIDIA binaries to your path:
export PATH=$PATH:/usr/local/cuda/bin
Step 2: Install Rust
Install Rust on your machine by run the following in your terminal, then following the instructions:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Step 3: Install necessary packages
This step can take a while as we need to compile a lot of cuda kernels.
For Turing GPUs (T4, RTX 2000 series … )
cargo install --path router -F candle-cuda-turing -F http --no-default-features
For Ampere and Hopper
cargo install --path router -F candle-cuda -F http --no-default-features
Step 4: Launch Text Embeddings Inference
You can now launch Text Embeddings Inference on GPU with:
model=BAAI/bge-large-en-v1.5 revision=refs/pr/5 text-embeddings-router --model-id $model --revision $revision --port 8080