ROCm support?
#3
by
hartmark
- opened
I'm on AMD so I need to have ROCm enabled torch.
I did these commands:
pip3 uninstall torch safetensors
pip3 install --pre \
torch safetensors \
--index-url https://download.pytorch.org/whl/nightly/rocm6.2 \
--root-user-action=ignore
Then I get this output:
% python app.py ~/Downloads/00100sPORTRAIT_00100_BURST20200715194128703_COVER.jpg
Loading CLIP π
preprocessor_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 368/368 [00:00<00:00, 2.66MB/s]
tokenizer_config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 711/711 [00:00<00:00, 5.47MB/s]
spiece.model: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 798k/798k [00:00<00:00, 34.0MB/s]
special_tokens_map.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 409/409 [00:00<00:00, 3.60MB/s]
config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 576/576 [00:00<00:00, 5.78MB/s]
model.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3.51G/3.51G [01:01<00:00, 56.9MB/s]
Loading tokenizer πͺ
tokenizer_config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50.6k/50.6k [00:00<00:00, 22.4MB/s]
tokenizer.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9.09M/9.09M [00:00<00:00, 16.2MB/s]
special_tokens_map.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 345/345 [00:00<00:00, 4.18MB/s]
Loading LLM π€
config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.41k/1.41k [00:00<00:00, 18.3MB/s]
model.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5.70G/5.70G [01:41<00:00, 56.1MB/s]
Could not load bitsandbytes native library: 'NoneType' object has no attribute 'split'
Traceback (most recent call last):
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 109, in <module>
lib = get_native_library()
^^^^^^^^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 88, in get_native_library
cuda_specs = get_cuda_specs()
^^^^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 39, in get_cuda_specs
cuda_version_string=(get_cuda_version_string()),
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 29, in get_cuda_version_string
major, minor = get_cuda_version_tuple()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 24, in get_cuda_version_tuple
major, minor = map(int, torch.version.cuda.split("."))
^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
CUDA Setup failed despite CUDA being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
generation_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 230/230 [00:00<00:00, 2.47MB/s]
Loading image adapter πΌοΈ
Traceback (most recent call last):
File "/home/markus/code/joy-caption-pre-alpha/app.py", line 179, in <module>
main()
File "/home/markus/code/joy-caption-pre-alpha/app.py", line 149, in main
models = load_models()
^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/app.py", line 49, in load_models
image_adapter.load_state_dict(torch.load(CHECKPOINT_PATH / "image_adapter.pt", map_location="cpu", weights_only=True))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/torch/serialization.py", line 1286, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Unsupported operand 118
Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
python app.py 11.14s user 65.16s system 33% cpu 3:50.06 total
You can try to replace app.py with the following code and see if it works. The main issue is that there isn't BnB library support for RoCM so you'll need to find an appropriate Llama 3.1 model.
import torch
import torch.amp.autocast_mode
import os
import sys
import logging
import warnings
import argparse
from PIL import Image
from pathlib import Path
from tqdm import tqdm
from torch import nn
from transformers import AutoModel, AutoProcessor, AutoTokenizer, PreTrainedTokenizer, PreTrainedTokenizerFast, AutoModelForCausalLM
from typing import List, Union
# Constants
CLIP_PATH = "google/siglip-so400m-patch14-384"
VLM_PROMPT = "A descriptive caption for this image:\n"
MODEL_PATH = "unsloth/Meta-Llama-3.1-8B-bnb-4bit"
CHECKPOINT_PATH = Path("wpkklhc6")
IMAGE_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.bmp', '.webp')
warnings.filterwarnings("ignore", category=UserWarning)
logging.getLogger("transformers").setLevel(logging.ERROR)
# Determine the device
device = torch.device("cuda" if torch.cuda.is_available() else "rocm" if torch.backends.rocm.is_available() else "cpu")
class ImageAdapter(nn.Module):
def __init__(self, input_features: int, output_features: int):
super().__init__()
self.linear1 = nn.Linear(input_features, output_features)
self.activation = nn.GELU()
self.linear2 = nn.Linear(output_features, output_features)
def forward(self, vision_outputs: torch.Tensor):
return self.linear2(self.activation(self.linear1(vision_outputs)))
def load_models():
print("Loading CLIP π")
clip_processor = AutoProcessor.from_pretrained(CLIP_PATH)
clip_model = AutoModel.from_pretrained(CLIP_PATH).vision_model.eval().requires_grad_(False).to(device)
print("Loading tokenizer πͺ")
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, use_fast=False)
assert isinstance(tokenizer, (PreTrainedTokenizer, PreTrainedTokenizerFast)), f"Tokenizer is of type {type(tokenizer)}"
print("Loading LLM π€")
text_model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto", torch_dtype=torch.bfloat16).eval()
print("Loading image adapter πΌοΈ")
image_adapter = ImageAdapter(clip_model.config.hidden_size, text_model.config.hidden_size)
image_adapter.load_state_dict(torch.load(CHECKPOINT_PATH / "image_adapter.pt", map_location="cpu", weights_only=True))
image_adapter.eval().to(device)
return clip_processor, clip_model, tokenizer, text_model, image_adapter
@torch
.no_grad()
def stream_chat(input_images: List[Image.Image], batch_size: int, pbar: tqdm, models: tuple) -> List[str]:
clip_processor, clip_model, tokenizer, text_model, image_adapter = models
torch.cuda.empty_cache() if device.type == "cuda" else torch.hip.empty_cache() if device.type == "rocm" else None
all_captions = []
for i in range(0, len(input_images), batch_size):
batch = input_images[i:i+batch_size]
try:
images = clip_processor(images=batch, return_tensors='pt', padding=True).pixel_values.to(device)
except ValueError as e:
print(f"Error processing image batch: {e}")
print("Skipping this batch and continuing...")
continue
with torch.amp.autocast_mode.autocast(device.type, enabled=True):
vision_outputs = clip_model(pixel_values=images, output_hidden_states=True)
image_features = vision_outputs.hidden_states[-2]
embedded_images = image_adapter(image_features).to(dtype=torch.bfloat16)
prompt = tokenizer.encode(VLM_PROMPT, return_tensors='pt')
prompt_embeds = text_model.model.embed_tokens(prompt.to(device)).to(dtype=torch.bfloat16)
embedded_bos = text_model.model.embed_tokens(torch.tensor([[tokenizer.bos_token_id]], device=text_model.device, dtype=torch.int64)).to(dtype=torch.bfloat16)
inputs_embeds = torch.cat([
embedded_bos.expand(embedded_images.shape[0], -1, -1),
embedded_images,
prompt_embeds.expand(embedded_images.shape[0], -1, -1),
], dim=1).to(dtype=torch.bfloat16)
input_ids = torch.cat([
torch.tensor([[tokenizer.bos_token_id]], dtype=torch.long).expand(embedded_images.shape[0], -1),
torch.zeros((embedded_images.shape[0], embedded_images.shape[1]), dtype=torch.long),
prompt.expand(embedded_images.shape[0], -1),
], dim=1).to(device)
attention_mask = torch.ones_like(input_ids)
generate_ids = text_model.generate(
input_ids=input_ids,
inputs_embeds=inputs_embeds,
attention_mask=attention_mask,
max_new_tokens=300,
do_sample=True,
top_k=10,
temperature=0.5,
)
generate_ids = generate_ids[:, input_ids.shape[1]:]
for ids in generate_ids:
caption = tokenizer.decode(ids[:-1] if ids[-1] == tokenizer.eos_token_id else ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
caption = caption.replace('<|end_of_text|>', '').replace('<|finetune_right_pad_id|>', '').strip()
all_captions.append(caption)
if pbar:
pbar.update(len(batch))
return all_captions
def process_directory(input_dir: Path, output_dir: Path, batch_size: int, models: tuple):
output_dir.mkdir(parents=True, exist_ok=True)
image_files = [f for f in input_dir.iterdir() if f.suffix.lower() in IMAGE_EXTENSIONS]
images_to_process = [f for f in image_files if not (output_dir / f"{f.stem}.txt").exists()]
if not images_to_process:
print("No new images to process.")
return
with tqdm(total=len(images_to_process), desc="Processing images", unit="image") as pbar:
for i in range(0, len(images_to_process), batch_size):
batch_files = images_to_process[i:i+batch_size]
batch_images = [Image.open(f).convert('RGB') for f in batch_files]
captions = stream_chat(batch_images, batch_size, pbar, models)
for file, caption in zip(batch_files, captions):
with open(output_dir / f"{file.stem}.txt", 'w', encoding='utf-8') as f:
f.write(caption)
for img in batch_images:
img.close()
def parse_arguments():
parser = argparse.ArgumentParser(description="Process images and generate captions.")
parser.add_argument("input", nargs='+', help="Input image file or directory (or multiple directories)")
parser.add_argument("--output", help="Output directory (optional)")
parser.add_argument("--bs", type=int, default=4, help="Batch size (default: 4)")
return parser.parse_args()
def main():
args = parse_arguments()
input_paths = [Path(input_path) for input_path in args.input]
batch_size = args.bs
models = load_models()
for input_path in input_paths:
if input_path.is_file() and input_path.suffix.lower() in IMAGE_EXTENSIONS:
output_path = input_path.with_suffix('.txt')
print(f"Processing single image ποΈ: {input_path.name}")
with tqdm(total=1, desc="Processing image", unit="image") as pbar:
captions = stream_chat([Image.open(input_path).convert('RGB')], 1, pbar, models)
with open(output_path, 'w', encoding='utf-8') as f:
f.write(captions[0])
print(f"Output saved to {output_path}")
elif input_path.is_dir():
output_path = Path(args.output) if args.output else input_path
print(f"Processing directory π: {input_path}")
print(f"Output directory π¦: {output_path}")
print(f"Batch size ποΈ: {batch_size}")
process_directory(input_path, output_path, batch_size, models)
else:
print(f"Invalid input: {input_path}")
print("Skipping...")
if not input_paths:
print("Usage:")
print("For single image: python app.py [image_file] [--bs batch_size]")
print("For directory (same input/output): python app.py [directory] [--bs batch_size]")
print("For directory (separate input/output): python app.py [directory] --output [output_directory] [--bs batch_size]")
print("For multiple directories: python app.py [directory1] [directory2] ... [--output output_directory] [--bs batch_size]")
sys.exit(1)
if __name__ == "__main__":
main()
I created a file named test.py and got this output now.
% python test.py ~/Downloads/00100sPORTRAIT_00100_BURST20200715194128703_COVER.jpg
Loading CLIP π
Loading tokenizer πͺ
Loading LLM π€
Could not load bitsandbytes native library: 'NoneType' object has no attribute 'split'
Traceback (most recent call last):
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 109, in <module>
lib = get_native_library()
^^^^^^^^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 88, in get_native_library
cuda_specs = get_cuda_specs()
^^^^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 39, in get_cuda_specs
cuda_version_string=(get_cuda_version_string()),
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 29, in get_cuda_version_string
major, minor = get_cuda_version_tuple()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 24, in get_cuda_version_tuple
major, minor = map(int, torch.version.cuda.split("."))
^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
CUDA Setup failed despite CUDA being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
Loading image adapter πΌοΈ
Traceback (most recent call last):
File "/home/markus/code/joy-caption-pre-alpha/test.py", line 182, in <module>
main()
File "/home/markus/code/joy-caption-pre-alpha/test.py", line 152, in main
models = load_models()
^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/test.py", line 52, in load_models
image_adapter.load_state_dict(torch.load(CHECKPOINT_PATH / "image_adapter.pt", map_location="cpu", weights_only=True))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/markus/code/joy-caption-pre-alpha/venv/lib/python3.12/site-packages/torch/serialization.py", line 1286, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Unsupported operand 118
Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
python test.py 3.89s user 12.15s system 42% cpu 37.428 total
That's because the model you're trying to load is a BnB model. You need to use a non-BnB optimized llama 3.1 model. You can try something like akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic.