error

#1
by win10 - opened

Traceback (most recent call last):
File "c:\Users\jmes1\Downloads\molmo-7B-D-bnb-4bit.py", line 17, in
model = AutoModelForCausalLM.from_pretrained(repo_name, **arguments)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 557, in from_pretrained
cls.register(config.class, model_class, exist_ok=True)
File "c:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 584, in register
raise ValueError(
ValueError: The model class you are passing has a config_class attribute that is not consistent with the config class you passed (model has <class 'transformers_modules.allenai.Molmo-7B-D-0924.b72f6745657cddaf97041d88eb02b23756338219.config_molmo.MolmoConfig'> and you passed <class 'transformers_modules.cyan2k.molmo-7B-D-bnb-4bit.8b3b140bc14c05c77c30a112a939d8d2c4e0ee42.config_molmo.MolmoConfig'>. Fix one of those so they match!

can you help me?

Excellent job quantizing using BNB, thanks was looking for this. here's a sample script for @win10 . If it still doesn't work it might have something to do with the versions of libraries that you have installed.

SAMPLE SCRIPT
import sys
import os
from pathlib import Path

def set_cuda_paths():
    venv_base = Path(sys.executable).parent.parent
    nvidia_base_path = venv_base / 'Lib' / 'site-packages' / 'nvidia'
    cuda_path = nvidia_base_path / 'cuda_runtime' / 'bin'
    cublas_path = nvidia_base_path / 'cublas' / 'bin'
    cudnn_path = nvidia_base_path / 'cudnn' / 'bin'
    paths_to_add = [str(cuda_path), str(cublas_path), str(cudnn_path)]
    env_vars = ['CUDA_PATH', 'CUDA_PATH_V12_1', 'PATH']
    
    for env_var in env_vars:
        current_value = os.environ.get(env_var, '')
        new_value = os.pathsep.join(paths_to_add + [current_value] if current_value else paths_to_add)
        os.environ[env_var] = new_value

set_cuda_paths()

import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig

model_path = r"D:\Scripts\bench_vision\cyan2k--molmo-7B-D-bnb-4bit"

class VisionModel:
    def __init__(self):
        self.model = None
        self.processor = None
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    def initialize_model_and_processor(self):
        self.processor = AutoProcessor.from_pretrained(
            model_path,
            trust_remote_code=True,
            torch_dtype='auto',
            device_map='auto'
        )

        self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            trust_remote_code=True,
            torch_dtype='auto',
            device_map='auto'
        )

    def process_single_image(self, image_path):
        image = Image.open(image_path)

        # Ensure the image is in RGB format
        if image.mode != "RGB":
            image = image.convert("RGB")

        text = "Describe this image in detail as possible but be succinct and don't repeat yourself."
        inputs = self.processor.process(images=[image], text=text)
        inputs = {k: v.to(self.device).unsqueeze(0) for k, v in inputs.items()}

        output = self.model.generate_from_batch(
            inputs,
            GenerationConfig(max_new_tokens=500, stop_strings=["<|endoftext|>"]),
            tokenizer=self.processor.tokenizer
        )

        generated_tokens = output[0, inputs['input_ids'].size(1):]
        generated_text = self.processor.tokenizer.decode(generated_tokens, skip_special_tokens=True)

        print(f"\nGenerated Text:\n{generated_text}\n")

if __name__ == "__main__":
    image_path = r"D:\Scripts\bench_vision\IMG_140531.JPG"

    vision_model = VisionModel()
    vision_model.initialize_model_and_processor()
    vision_model.process_single_image(image_path)

Moreover, I modified some of the source code; specifically image_preprocessing_molmo.py. After you download all the files, replace the resize_and_pad function with my custom version. This no longer uses tensorflow and simply uses more commonly-used libraries. I did this because I ran into massive problems trying to install and run tensorflow and it's massive dependencies:

CUSTOM resize_and_pad FUNCTION
def resize_and_pad(
    image: np.ndarray,
    desired_output_size: List[int],
    resize_method: str = "bilinear",
    pad_value: float = 0,
    normalize: bool = True,
    image_mean: Optional[List[float]] = OPENAI_CLIP_MEAN,
    image_std: Optional[List[float]] = OPENAI_CLIP_STD,
) -> (np.ndarray, np.ndarray):
    """
    Resize and pad the image to the desired output size.

    Args:
        image (np.ndarray): Input image as a NumPy array.
        desired_output_size (List[int]): Desired output size as [height, width].
        resize_method (str, optional): Resize interpolation method. Defaults to "bilinear".
        pad_value (float, optional): Padding value. Defaults to 0.
        normalize (bool, optional): Whether to normalize the image. Defaults to True.
        image_mean (Optional[List[float]], optional): Mean for normalization. Defaults to OPENAI_CLIP_MEAN.
        image_std (Optional[List[float]], optional): Standard deviation for normalization. Defaults to OPENAI_CLIP_STD.

    Returns:
        Tuple[np.ndarray, np.ndarray]: Resized and padded image, and image mask.
    """
    desired_height, desired_width = desired_output_size
    height, width = image.shape[:2]

    # Calculate scaling factors and determine the scaling factor to maintain aspect ratio
    scale_y = desired_height / height
    scale_x = desired_width / width
    scale = min(scale_x, scale_y)
    scaled_height = int(height * scale)
    scaled_width = int(width * scale)

    # Convert the image to a PyTorch tensor and normalize to [0, 1]
    image_tensor = torch.from_numpy(image).permute(2, 0, 1).float() / 255.0

    # Define the interpolation mode
    if resize_method.lower() == "bilinear":
        interpolation = InterpolationMode.BILINEAR
    elif resize_method.lower() == "nearest":
        interpolation = InterpolationMode.NEAREST
    elif resize_method.lower() == "bicubic":
        interpolation = InterpolationMode.BICUBIC
    elif resize_method.lower() == "lanczos":
        interpolation = InterpolationMode.LANCZOS
    else:
        raise ValueError(f"Unsupported resize method: {resize_method}")

    # Resize the image
    resized_image = torchvision.transforms.Resize(
        [scaled_height, scaled_width],
        interpolation=interpolation,
        antialias=True
    )(image_tensor)

    # Clip the image to ensure values are within [0, 1]
    resized_image = torch.clamp(resized_image, 0.0, 1.0)

    # Convert back to NumPy
    resized_image_np = resized_image.permute(1, 2, 0).numpy()

    # Calculate padding
    top_pad = (desired_height - scaled_height) // 2
    bottom_pad = desired_height - scaled_height - top_pad
    left_pad = (desired_width - scaled_width) // 2
    right_pad = desired_width - scaled_width - left_pad

    # Pad the image using NumPy
    padded_image = np.pad(
        resized_image_np,
        pad_width=((top_pad, bottom_pad), (left_pad, right_pad), (0, 0)),
        mode='constant',
        constant_values=pad_value
    )

    # Create the image mask
    image_mask = np.pad(
        np.ones((scaled_height, scaled_width), dtype=bool),
        pad_width=((top_pad, bottom_pad), (left_pad, right_pad)),
        mode='constant',
        constant_values=False
    )

    if normalize:
        padded_image = normalize_image(padded_image, offset=image_mean, scale=image_std)

    return padded_image, image_mask

I have verified that the above script works as long as you modify the source code as I outlined.

I am getting this error: PS C:\Users\15023\Documents\Models\FD> & c:/Users/15023/Documents/Models/GPD/.venv/Scripts/python.exe c:/Users/15023/Documents/Models/D/molmo_test.py
2024-09-29 18:03:15.863138: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-09-29 18:03:16.651278: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:03<00:00, 1.66s/it]
2024-09-29 18:03:24.165424: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "c:\Users\15023\Documents\Models\FPD\molmo_test.py", line 33, in
attention_mask=inputs["attention_mask"],
~~~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'attention_mask'

Sign up or log in to comment