File size: 2,014 Bytes
a473468 21a24a4 be9b3bd 58ae3bc 6477241 b90349b e12d448 a473468 233349b a473468 58ae3bc a473468 e12d448 9f86748 e12d448 9f86748 a473468 233349b a473468 233349b a473468 233349b a473468 202d5d5 a473468 233349b a473468 233349b a473468 77c4ed6 a473468 233349b a473468 233349b a473468 233349b a473468 233349b a473468 58ae3bc 9f86748 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
language:
- en
library_name: transformers
base_model: google/paligemma-3b-pt-224
pipeline_tag: visual-question-answering
inference: false
tags:
- paligemma
- coffe
- caption
license: mit
---
# Model Card for Model ID
Google's Paligemma VLM (Vision Language Model) finetuned to provide captions to coffe machine images
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Komorebi AI
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model :** google/paligemma-3b-pt-224
- **Demo :** https://huggingface.co/spaces/Fer14/coffe_machine_caption
## Usage
```python
from transformers import PaliGemmaForConditionalGeneration, PaliGemmaProcessor
from PIL import Image
model_id = "Fer14/paligemma_coffee_machine_caption"
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id)
processor = PaliGemmaProcessor.from_pretrained(model_id)
image = Image.open("path to your image").convert("RGB")
prompt = (
f"Generate a caption for the following coffee maker image. The caption has to be of the following structure:\n"
"\"A <color> <type>, <accessories>, <shape> shaped, with <screen> and <number> <b_color> butons\"\n\n"
"in which:\n"
"- color: red, black, blue...\n"
"- type: coffee machine, coffee maker, espresso coffee machine...\n"
"- accessories: a list of accessories like the ones described above\n"
"- shape: cubed, round...\n"
"- screen: screen, no screen.\n"
"- number: amount of buttons to add\n"
"- b_color: color of the buttons"
)
inputs = processor(
text=prompt,
images=image,
return_tensors="pt",
padding="longest",
)
output = model.generate(**inputs, max_length=1000)
decoded_output = processor.decode(output[0], skip_special_tokens=True)[len(prompt) :]
```
### Framework versions
- PEFT 0.11.1
- Transformers 4.41.2 |