Run on MBP M1
wondering if there's any way to run this model on my Macbook pro M1. the "cuda:0"
causes all sorts of issues, even after setting device=torch.float16
.
The device you should use is either βcpuβ or βmpsβ for Mac
Anyone managed to run it?
Using device: mps
Downloading (β¦)lve/main/config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 768/768 [00:00<00:00, 1.59MB/s]
Downloading (β¦)fetensors.index.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 58.2k/58.2k [00:00<00:00, 800kB/s]
Downloading (β¦)of-00002.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9.93G/9.93G [06:55<00:00, 23.9MB/s]
Downloading (β¦)of-00002.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8.88G/8.88G [08:00<00:00, 18.5MB/s]
Downloading shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [14:56<00:00, 448.48s/it]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]Killed: 9
/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Macbook pro m2 32 gb ram - not enough?
The following works for me. The Python process seems to take ~21 GB when using float16
in my M1.
import requests
import torch
from PIL import Image
from transformers import AutoTokenizer, FuyuForCausalLM, FuyuImageProcessor, FuyuProcessor
device = "mps"
# Metal supports `bfloat16` in Sonoma, but it still doesn't work
dtype = torch.bfloat16 if device != "mps" else torch.float16
model_id = "adept/fuyu-8b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = FuyuForCausalLM.from_pretrained(model_id, device_map=device, torch_dtype=dtype)
processor = FuyuProcessor(image_processor=FuyuImageProcessor(), tokenizer=tokenizer)
prompt = "Generate a coco-style caption.\n"
url = "https://huggingface.co/adept/fuyu-8b/resolve/main/bus.png"
image = Image.open(requests.get(url, stream=True).raw)
model_inputs = processor(text=prompt, images=[image])
model_inputs = {k: v.to(dtype=dtype if torch.is_floating_point(v) else v.dtype, device=device) for k,v in model_inputs.items()}
prompt_len = model_inputs["input_ids"].shape[-1]
generation_output = model.generate(**model_inputs, max_new_tokens=10)
print(tokenizer.decode(generation_output[0][prompt_len:], skip_special_tokens=True))
@pcuenq Which transformers version is this running with? I tried and when you're doing v.to(dtype=...) , v is a list of tensors. In order for that call to work v must have been a tensor.
model_inputs = processor(text=prompt, images=[image])
for k, v in model_inputs.items():
if isinstance(v, list):
model_inputs[k] = [item.to(dtype=dtype if torch.is_floating_point(item) else item.dtype, device=device) for item in v]
else:
model_inputs[k] = v.to(dtype=dtype if torch.is_floating_point(v) else v.dtype, device=device)
Works like this. The else is not needed on my part but I wanted to keep the original case as well.