Inference freezes using the recommended VLLM approach

#5
by dhaneshsabane - opened

I used the recommended VLLM approach to run the model on a server with 8 * 80GB Nvidia A100 PCIE GPUs. I'm using the same script with two modifications:

  1. tp_size = 8
  2. model_name = deepseek-ai/DeepSeek-Coder-V2-Instruct

Upon execution of the script, I see the following but nothing happens after the last line in the log. I waited for 15-20 mins to get a response but eventually lost patience.

INFO 07-02 17:41:31 config.py:656] Defaulting to use mp for distributed inference
INFO 07-02 17:41:31 llm_engine.py:169] Initializing an LLM engine (v0.5.0.post1) with config: model='deepseek-ai/DeepSeek-Coder-V2-Instruct', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-Coder-V2-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=deepseek-ai/DeepSeek-Coder-V2-Instruct)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
(VllmWorkerProcess pid=9504) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9502) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9500) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9503) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9505) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9501) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9506) INFO 07-02 17:41:34 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=9500) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9500) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9503) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9503) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9502) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9505) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9502) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9505) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9504) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9506) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9504) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9506) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
(VllmWorkerProcess pid=9501) INFO 07-02 17:41:35 utils.py:719] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=9501) INFO 07-02 17:41:35 pynccl.py:63] vLLM is using nccl==2.20.5
fd8c67c1741e:9431:9431 [0] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9431:9431 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9431:9431 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.20.5+cuda12.4
fd8c67c1741e:9503:9503 [4] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9503:9503 [4] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9503:9503 [4] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9505:9505 [6] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9505:9505 [6] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9505:9505 [6] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9504:9504 [5] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9504:9504 [5] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9504:9504 [5] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9500:9500 [1] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9500:9500 [1] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9500:9500 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9501:9501 [2] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9501:9501 [2] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9501:9501 [2] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9506:9506 [7] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9506:9506 [7] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9502:9502 [3] NCCL INFO cudaDriverVersion 12040
fd8c67c1741e:9506:9506 [7] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9502:9502 [3] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
fd8c67c1741e:9502:9502 [3] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
fd8c67c1741e:9503:9503 [4] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9503:9503 [4] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9503:9503 [4] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9503:9503 [4] NCCL INFO Using network Socket
fd8c67c1741e:9506:9506 [7] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9506:9506 [7] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9506:9506 [7] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9506:9506 [7] NCCL INFO Using network Socket
fd8c67c1741e:9431:9431 [0] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9431:9431 [0] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9431:9431 [0] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9431:9431 [0] NCCL INFO Using network Socket
fd8c67c1741e:9505:9505 [6] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9505:9505 [6] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9505:9505 [6] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9505:9505 [6] NCCL INFO Using network Socket
fd8c67c1741e:9504:9504 [5] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9502:9502 [3] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9502:9502 [3] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9504:9504 [5] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9502:9502 [3] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9504:9504 [5] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9502:9502 [3] NCCL INFO Using network Socket
fd8c67c1741e:9504:9504 [5] NCCL INFO Using network Socket
fd8c67c1741e:9501:9501 [2] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9501:9501 [2] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9501:9501 [2] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9501:9501 [2] NCCL INFO Using network Socket
fd8c67c1741e:9500:9500 [1] NCCL INFO Failed to open libibverbs.so[.1]
fd8c67c1741e:9500:9500 [1] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
fd8c67c1741e:9500:9500 [1] NCCL INFO Using non-device net plugin version 0
fd8c67c1741e:9500:9500 [1] NCCL INFO Using network Socket
fd8c67c1741e:9501:9501 [2] NCCL INFO comm 0x575c1e119e50 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 80 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9500:9500 [1] NCCL INFO comm 0x575c1e118840 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 70 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9503:9503 [4] NCCL INFO comm 0x575c1e11ccb0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId a0 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9506:9506 [7] NCCL INFO comm 0x575c1e11c9d0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId d0 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9431:9431 [0] NCCL INFO comm 0x575c1e121890 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 60 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9505:9505 [6] NCCL INFO comm 0x575c1e11d9f0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c0 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9504:9504 [5] NCCL INFO comm 0x575c1e11b560 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId b0 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9502:9502 [3] NCCL INFO comm 0x575c1e11a750 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 90 commId 0x7ffd0231764371a - Init START
fd8c67c1741e:9500:9500 [1] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9500:9500 [1] NCCL INFO NVLS multicast support is not available on dev 1
fd8c67c1741e:9506:9506 [7] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9506:9506 [7] NCCL INFO NVLS multicast support is not available on dev 7
fd8c67c1741e:9501:9501 [2] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9501:9501 [2] NCCL INFO NVLS multicast support is not available on dev 2
fd8c67c1741e:9503:9503 [4] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9503:9503 [4] NCCL INFO NVLS multicast support is not available on dev 4
fd8c67c1741e:9431:9431 [0] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9431:9431 [0] NCCL INFO NVLS multicast support is not available on dev 0
fd8c67c1741e:9505:9505 [6] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9505:9505 [6] NCCL INFO NVLS multicast support is not available on dev 6
fd8c67c1741e:9502:9502 [3] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9502:9502 [3] NCCL INFO NVLS multicast support is not available on dev 3
fd8c67c1741e:9504:9504 [5] NCCL INFO NCCL_IGNORE_DISABLED_P2P set by environment to 1.
fd8c67c1741e:9504:9504 [5] NCCL INFO NVLS multicast support is not available on dev 5
fd8c67c1741e:9501:9501 [2] NCCL INFO comm 0x575c1e119e50 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0
fd8c67c1741e:9503:9503 [4] NCCL INFO comm 0x575c1e11ccb0 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0
fd8c67c1741e:9500:9500 [1] NCCL INFO comm 0x575c1e118840 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0
fd8c67c1741e:9502:9502 [3] NCCL INFO comm 0x575c1e11a750 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0
fd8c67c1741e:9431:9431 [0] NCCL INFO comm 0x575c1e121890 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0
fd8c67c1741e:9506:9506 [7] NCCL INFO comm 0x575c1e11c9d0 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0
fd8c67c1741e:9505:9505 [6] NCCL INFO comm 0x575c1e11d9f0 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0
fd8c67c1741e:9504:9504 [5] NCCL INFO comm 0x575c1e11b560 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0
fd8c67c1741e:9505:9505 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5
fd8c67c1741e:9504:9504 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4
fd8c67c1741e:9500:9500 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0
fd8c67c1741e:9505:9505 [6] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9501:9501 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1
fd8c67c1741e:9504:9504 [5] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9503:9503 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3
fd8c67c1741e:9500:9500 [1] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9502:9502 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 00/04 :    0   1   2   3   4   5   6   7
fd8c67c1741e:9506:9506 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6
fd8c67c1741e:9501:9501 [2] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9506:9506 [7] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9503:9503 [4] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9502:9502 [3] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 01/04 :    0   1   2   3   4   5   6   7
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 02/04 :    0   1   2   3   4   5   6   7
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 03/04 :    0   1   2   3   4   5   6   7
fd8c67c1741e:9431:9431 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1
fd8c67c1741e:9431:9431 [0] NCCL INFO P2P Chunksize set to 131072
fd8c67c1741e:9505:9505 [6] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9431:9431 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9501:9501 [2] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9502:9502 [3] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9506:9506 [7] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9504:9504 [5] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9500:9500 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9503:9503 [4] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 00 : 6[6] -> 7[7] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 01 : 6[6] -> 7[7] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 02 : 6[6] -> 7[7] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 03 : 6[6] -> 7[7] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 00 : 2[2] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 00 : 1[1] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 00 : 4[4] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 00 : 0[0] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 00 : 7[7] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 00 : 3[3] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 01 : 2[2] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 01 : 1[1] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 00 : 5[5] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 01 : 4[4] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 01 : 0[0] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 01 : 7[7] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 01 : 3[3] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 02 : 2[2] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 02 : 1[1] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 01 : 5[5] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 02 : 4[4] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 02 : 0[0] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 02 : 7[7] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 02 : 3[3] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 03 : 2[2] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 03 : 1[1] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 02 : 5[5] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 03 : 4[4] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Channel 03 : 0[0] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 03 : 7[7] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 03 : 3[3] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 03 : 5[5] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Connected all rings
fd8c67c1741e:9500:9500 [1] NCCL INFO Connected all rings
fd8c67c1741e:9506:9506 [7] NCCL INFO Connected all rings
fd8c67c1741e:9505:9505 [6] NCCL INFO Connected all rings
fd8c67c1741e:9504:9504 [5] NCCL INFO Connected all rings
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 00 : 7[7] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Connected all rings
fd8c67c1741e:9501:9501 [2] NCCL INFO Connected all rings
fd8c67c1741e:9502:9502 [3] NCCL INFO Connected all rings
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 01 : 7[7] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 02 : 7[7] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9506:9506 [7] NCCL INFO Channel 03 : 7[7] -> 6[6] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 00 : 1[1] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 00 : 4[4] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 01 : 1[1] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 01 : 4[4] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 02 : 1[1] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 02 : 4[4] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9500:9500 [1] NCCL INFO Channel 03 : 1[1] -> 0[0] via SHM/direct/direct
fd8c67c1741e:9503:9503 [4] NCCL INFO Channel 03 : 4[4] -> 3[3] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 00 : 6[6] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 01 : 6[6] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 02 : 6[6] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9505:9505 [6] NCCL INFO Channel 03 : 6[6] -> 5[5] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 00 : 2[2] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 01 : 2[2] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 02 : 2[2] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 00 : 3[3] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 00 : 5[5] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 01 : 3[3] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9501:9501 [2] NCCL INFO Channel 03 : 2[2] -> 1[1] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 01 : 5[5] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 02 : 3[3] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 02 : 5[5] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9502:9502 [3] NCCL INFO Channel 03 : 3[3] -> 2[2] via SHM/direct/direct
fd8c67c1741e:9504:9504 [5] NCCL INFO Channel 03 : 5[5] -> 4[4] via SHM/direct/direct
fd8c67c1741e:9431:9431 [0] NCCL INFO Connected all trees
fd8c67c1741e:9431:9431 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9431:9431 [0] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9506:9506 [7] NCCL INFO Connected all trees
fd8c67c1741e:9506:9506 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9506:9506 [7] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9500:9500 [1] NCCL INFO Connected all trees
fd8c67c1741e:9500:9500 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9500:9500 [1] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9501:9501 [2] NCCL INFO Connected all trees
fd8c67c1741e:9501:9501 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9501:9501 [2] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9505:9505 [6] NCCL INFO Connected all trees
fd8c67c1741e:9505:9505 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9505:9505 [6] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9503:9503 [4] NCCL INFO Connected all trees
fd8c67c1741e:9503:9503 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9503:9503 [4] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9504:9504 [5] NCCL INFO Connected all trees
fd8c67c1741e:9502:9502 [3] NCCL INFO Connected all trees
fd8c67c1741e:9504:9504 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9504:9504 [5] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9502:9502 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
fd8c67c1741e:9502:9502 [3] NCCL INFO 4 coll channels, 0 collnet channels, 0 nvls channels, 4 p2p channels, 2 p2p channels per peer
fd8c67c1741e:9431:9431 [0] NCCL INFO comm 0x575c1e121890 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 60 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9503:9503 [4] NCCL INFO comm 0x575c1e11ccb0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId a0 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9505:9505 [6] NCCL INFO comm 0x575c1e11d9f0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c0 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9501:9501 [2] NCCL INFO comm 0x575c1e119e50 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 80 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9506:9506 [7] NCCL INFO comm 0x575c1e11c9d0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId d0 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9502:9502 [3] NCCL INFO comm 0x575c1e11a750 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 90 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9504:9504 [5] NCCL INFO comm 0x575c1e11b560 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId b0 commId 0x7ffd0231764371a - Init COMPLETE
fd8c67c1741e:9500:9500 [1] NCCL INFO comm 0x575c1e118840 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 70 commId 0x7ffd0231764371a - Init COMPLETE
WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9503) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9506) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9501) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9500) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9504) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9505) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9502) WARNING 07-02 17:41:37 custom_all_reduce.py:118] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
(VllmWorkerProcess pid=9502) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9506) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9504) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9500) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9505) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9501) Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9503) Cache shape torch.Size([163840, 64])
Cache shape torch.Size([163840, 64])
(VllmWorkerProcess pid=9502) INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9500) INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9501) INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9505) INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9503) INFO 07-02 17:41:39 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9506) INFO 07-02 17:41:40 weight_utils.py:218] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=9504) INFO 07-02 17:41:40 weight_utils.py:218] Using model weights format ['*.safetensors']

Evidently, the GPU vRAM is being used while this process is running:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          Off |   00000000:00:06.0 Off |                    0 |
| N/A   47C    P0             71W /  300W |   59109MiB /  81920MiB |      2%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          Off |   00000000:00:07.0 Off |                    0 |
| N/A   48C    P0             69W /  300W |   59109MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100 80GB PCIe          Off |   00000000:00:08.0 Off |                    0 |
| N/A   48C    P0             67W /  300W |   59109MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100 80GB PCIe          Off |   00000000:00:09.0 Off |                    0 |
| N/A   48C    P0             73W /  300W |   59109MiB /  81920MiB |      1%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A100 80GB PCIe          Off |   00000000:00:0A.0 Off |                    0 |
| N/A   49C    P0             70W /  300W |   59109MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A100 80GB PCIe          Off |   00000000:00:0B.0 Off |                    0 |
| N/A   49C    P0             67W /  300W |   59109MiB /  81920MiB |      1%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A100 80GB PCIe          Off |   00000000:00:0C.0 Off |                    0 |
| N/A   47C    P0             68W /  300W |   59109MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A100 80GB PCIe          Off |   00000000:00:0D.0 Off |                    0 |
| N/A   47C    P0             71W /  300W |   59109MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     21044      C   python3                                     59100MiB |
|    1   N/A  N/A     21177      C   python3                                     59100MiB |
|    2   N/A  N/A     21178      C   python3                                     59100MiB |
|    3   N/A  N/A     21179      C   python3                                     59100MiB |
|    4   N/A  N/A     21180      C   python3                                     59100MiB |
|    5   N/A  N/A     21181      C   python3                                     59100MiB |
|    6   N/A  N/A     21182      C   python3                                     59100MiB |
|    7   N/A  N/A     21183      C   python3                                     59100MiB |
+-----------------------------------------------------------------------------------------+

A couple things I would try:

In your vllms params when running the command - use --gpu-memory-utilization 0.95 flag and use something like 0.95 - this is the % of the gpu ram it will use. It looks you are only using around 2/3 of your available ram.

I don't know what your max-model-len is but maybe try with default 8192 to start. This also depends what version of vllm you are using

Thanks @cybrtooth !

It was probably an issue with the merge request that I was working with. Using v0.5.3.post1 worked.

https://github.com/deepseek-ai/DeepSeek-Coder-V2/issues/29#issuecomment-2260838036

dhaneshsabane changed discussion status to closed

Sign up or log in to comment