Fer14
/

paligemma_coffee_machine_caption

Visual Question Answering

text-generation-inference

Model card Files Files and versions Community

paligemma_coffee_machine_caption / README.md

Fer14's picture

Update README.md

202d5d5 verified 3 months ago

|

history blame contribute delete

No virus

2.01 kB

	---
	language:
	- en
	library_name: transformers
	base_model: google/paligemma-3b-pt-224
	pipeline_tag: visual-question-answering
	inference: false
	tags:
	- paligemma
	- coffe
	- caption
	license: mit
	---

	# Model Card for Model ID

	Google's Paligemma VLM (Vision Language Model) finetuned to provide captions to coffe machine images


	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: Komorebi AI
	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model : google/paligemma-3b-pt-224
	- Demo : https://huggingface.co/spaces/Fer14/coffe_machine_caption

	## Usage

	```python

	from transformers import PaliGemmaForConditionalGeneration, PaliGemmaProcessor
	from PIL import Image


	model_id = "Fer14/paligemma_coffee_machine_caption"

	model = PaliGemmaForConditionalGeneration.from_pretrained(model_id)
	processor = PaliGemmaProcessor.from_pretrained(model_id)


	image = Image.open("path to your image").convert("RGB")

	prompt = (
	f"Generate a caption for the following coffee maker image. The caption has to be of the following structure:\n"
	"\"A <color> <type>, <accessories>, <shape> shaped, with <screen> and <number> <b_color> butons\"\n\n"
	"in which:\n"
	"- color: red, black, blue...\n"
	"- type: coffee machine, coffee maker, espresso coffee machine...\n"
	"- accessories: a list of accessories like the ones described above\n"
	"- shape: cubed, round...\n"
	"- screen: screen, no screen.\n"
	"- number: amount of buttons to add\n"
	"- b_color: color of the buttons"
	)

	inputs = processor(
	text=prompt,
	images=image,
	return_tensors="pt",
	padding="longest",
	)

	output = model.generate(**inputs, max_length=1000)

	decoded_output = processor.decode(output[0], skip_special_tokens=True)[len(prompt) :]

	```


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2