Working sample for mac

#11
by spawn99 - opened

Here is a working example for use on apple silicon devices if it's of use to anyone.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99

how much ram is required?

this is at full precision.

Screenshot 2024-09-04 at 21.39.56.png

Nice. Well, just for fun - I still managed to run this with 16GB.

With modifications to the model loading process (removing .to(device)) and the addition of offload_buffers=True, you can run this even on systems with 16GB of RAM. Please note that while this allows for broader compatibility, you should expect significantly longer response times during testing. Basically useless for standard use.

model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
offload_buffers=True # Added
)

Encountered the following error

RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I am using two Tesla V100-SXM2-32GB GPU, Any one knows how many GPU it needs? thank you.

not work on m3 pro while using mps. the error info:
截屏2024-09-09 16.52.19.png

截屏2024-09-09 16.53.04.png

not work on m3 pro while using mps. the error info:
截屏2024-09-09 16.52.19.png

截屏2024-09-09 16.53.04.png

I solved it by upgrade torch to 2.4.0 : pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0

Encountered the following error

RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I am using two Tesla V100-SXM2-32GB GPU, Any one knows how many GPU it needs? thank you.

This sample is not with CUDA but for apple silicon

Encountered the following error:
/AppleInternal/Library/BuildRoots/4ff29661-3588-11ef-9513-e2437461156c/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:788: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32'
[1] 67229 abort python qwen2vl.py
I am using m2 max with 96g ram.

Your image is too large and the tensor size becomes too large. Use a smaller one.

I've updated the gist to downscale an image if it's too large so this doesn't happen.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99

Your image is too large and the tensor size becomes too large. Use a smaller one.

I've updated the gist to downscale an image if it's too large so this doesn't happen.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99

The new version is OK, thx!

Here is a working example for use on apple silicon devices if it's of use to anyone.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99

To use the new 2B model replace 7B with 2B in the code.
Working quite well…

You can run this on much smaller apple silicon hardware using 2B model with FP16,(also shortened the prompt), runs fast and smooth
Here's the modified code of

Here is a working example for use on apple silicon devices if it's of use to anyone.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99

In this repo-->https://github.com/satvikahuja/Easy-qwen2vlm2b-4macbook?tab=readme-ov-file

This comment has been hidden

Sign up or log in to comment