Working sample for mac
Here is a working example for use on apple silicon devices if it's of use to anyone.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99
how much ram is required?
Nice. Well, just for fun - I still managed to run this with 16GB.
With modifications to the model loading process (removing .to(device)
) and the addition of offload_buffers=True
, you can run this even on systems with 16GB of RAM. Please note that while this allows for broader compatibility, you should expect significantly longer response times during testing. Basically useless for standard use.
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
offload_buffers=True # Added
)
Encountered the following error
RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I am using two Tesla V100-SXM2-32GB GPU, Any one knows how many GPU it needs? thank you.
Encountered the following error
RuntimeError: CUDA error: too many resources requested for launch CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I am using two Tesla V100-SXM2-32GB GPU, Any one knows how many GPU it needs? thank you.
This sample is not with CUDA but for apple silicon
Encountered the following error:
/AppleInternal/Library/BuildRoots/4ff29661-3588-11ef-9513-e2437461156c/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:788: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32'
[1] 67229 abort python qwen2vl.py
I am using m2 max with 96g ram.
Your image is too large and the tensor size becomes too large. Use a smaller one.
I've updated the gist to downscale an image if it's too large so this doesn't happen.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99
Your image is too large and the tensor size becomes too large. Use a smaller one.
I've updated the gist to downscale an image if it's too large so this doesn't happen.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99
The new version is OK, thx!
Here is a working example for use on apple silicon devices if it's of use to anyone.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99
To use the new 2B model replace 7B with 2B in the code.
Working quite well…
You can run this on much smaller apple silicon hardware using 2B model with FP16,(also shortened the prompt), runs fast and smooth
Here's the modified code of
Here is a working example for use on apple silicon devices if it's of use to anyone.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99
In this repo-->https://github.com/satvikahuja/Easy-qwen2vlm2b-4macbook?tab=readme-ov-file