Good results
Just wanted to say that I'm very happy with the results for this model! I'm running it on an NVidia A100 80GB, and it generates captions within 5 seconds like these:
The image depicts a crowd of people dressed in colorful costumes and masks, some of which are wearing deer antlers on their heads. There are several people walking down the street, some of whom are holding up their hands and waving to the crowd. Some of the people are standing, while others are sitting or kneeling on the ground. There is also a group of people standing at the side of the street, possibly waiting for the parade to begin.
Thanks a lot! Also make sure to benefit from the int4 quantization using bitsandbytes, which greatly reduces the memory :) i.e. it's best to use:
from_pretrained("Salesforce/instructblip-flan-t5-xxl", device_map={"":0}, load_in_4bit=True, torch_dtype=torch.bfloat16)
and make sure to cast the inputs
to torch.bfloat16
when providing them to the model.
Why does it appear that there's an API call to Replicate for this?
Why does it appear that there's an API call to Replicate for this?
I don't understand what you mean, I don't see it anywhere in this thread or on the model card?
I have misspoken. I am getting a 403 error and I don't understand even though I have used the Huggingface API key before. This is my Python code:
api_url = "https://api-inference.huggingface.co/models/Salesforce/instructblip-flan-t5-xxl" # https://huggingface.co/Salesforce/instructblip-flan-t5-xxl FLAN-T5 11B
prompt = "Provide a detailed and vivid description of the image as if you are narrating it to someone who cannot see. Include the main elements, colors, positions, ambiance, tone and any emotion or story the image might convey."
headers = {"Authorization": f"Bearer {config.hugging_face_api_key}"}
image_res = requests.get(image_url)
image_res.raise_for_status()
data = {
"prompt": prompt,
"img": image_res.content,
}
hf_response = requests.post(api_url, headers=headers, data=data)
hf_response.raise_for_status()
Can you show us the actual error?
Traceback (most recent call last):
File "C:\Users\james\test_instructblip_4.py", line 17, in
processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-flan-t5-xxl")
^^^^^^^^^^^^^^^^
File "C:\Users\james\blip_env\Lib\site-packages\transformers\modeling_utils.py", line 1936, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\james\blip_env\Lib\site-packages\torch\nn\modules\module.py", line 1145, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\james\blip_env\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "C:\Users\james\blip_env\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "C:\Users\james\blip_env\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 5 more times]
File "C:\Users\james\blip_env\Lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
^^^^^^^^^
File "C:\Users\james\blip_env\Lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 23.99 GiB total capacity; 22.86 GiB already allocated; 0 bytes free; 22.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF