Working sample for mac

#11

by spawn99 - opened Sep 4

Sep 4

Here is a working example for use on apple silicon devices if it's of use to anyone.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99

NicodemPL

Sep 4

how much ram is required?

spawn99

Sep 4

this is at full precision.

NicodemPL

Sep 5

Nice. Well, just for fun - I still managed to run this with 16GB.

With modifications to the model loading process (removing .to(device)) and the addition of offload_buffers=True, you can run this even on systems with 16GB of RAM. Please note that while this allows for broader compatibility, you should expect significantly longer response times during testing. Basically useless for standard use.

model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
offload_buffers=True # Added
)

JerryKwan

Sep 6

Encountered the following error

RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I am using two Tesla V100-SXM2-32GB GPU, Any one knows how many GPU it needs? thank you.

RArchered

Sep 9

not work on m3 pro while using mps. the error info:

RArchered

Sep 9

not work on m3 pro while using mps. the error info:

I solved it by upgrade torch to 2.4.0 : pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0

spawn99

Sep 9

Encountered the following error

RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I am using two Tesla V100-SXM2-32GB GPU, Any one knows how many GPU it needs? thank you.

This sample is not with CUDA but for apple silicon

colin4k

Sep 11

Encountered the following error:
/AppleInternal/Library/BuildRoots/4ff29661-3588-11ef-9513-e2437461156c/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:788: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32'
[1] 67229 abort python qwen2vl.py
I am using m2 max with 96g ram.

spawn99

Sep 11

Your image is too large and the tensor size becomes too large. Use a smaller one.

I've updated the gist to downscale an image if it's too large so this doesn't happen.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99

colin4k

Sep 13

Your image is too large and the tensor size becomes too large. Use a smaller one.

I've updated the gist to downscale an image if it's too large so this doesn't happen.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99

The new version is OK, thx!

satvikahuja

Sep 25

•

edited Sep 25

Here is a working example for use on apple silicon devices if it's of use to anyone.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99

To use the new 2B model replace 7B with 2B in the code.
Working quite well…

satvikahuja

Sep 25

•

edited Sep 25

You can run this on much smaller apple silicon hardware using 2B model with FP16,(also shortened the prompt), runs fast and smooth
Here's the modified code of

Here is a working example for use on apple silicon devices if it's of use to anyone.
https://gist.github.com/cavit99/811919b3e7753c925ab603b1929dbd99

In this repo-->https://github.com/satvikahuja/Easy-qwen2vlm2b-4macbook?tab=readme-ov-file

satvikahuja

Sep 25

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment