ydshieh
/

kosmos-2-patch14-224

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

ydshieh HF staff commited on Aug 19, 2023

Commit

62c089e

•

1 Parent(s): 8dcd01c

Create README.md

Files changed (1) hide show

README.md +42 -0

README.md ADDED Viewed

	@@ -0,0 +1,42 @@

+---
+# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/model-cards
+{}
+---
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```python
+from PIL import Image
+from transformers import AutoProcessor, AutoModelForVision2Seq
+model = AutoModelForVision2Seq.from_pretrained("ydshieh/kosmos-2-patch14-224", trust_remote_code=True)
+processor = AutoProcessor.from_pretrained("ydshieh/kosmos-2-patch14-224", trust_remote_code=True)
+prompt = "<grounding>An image of"
+image = Image.open("snowman.jpg")
+inputs = processor(text=prompt, images=image, return_tensors="pt")
+generated_ids = model.generate(
+    pixel_values=inputs["pixel_values"],
+    input_ids=inputs["input_ids"][:, :-1],
+    attention_mask=inputs["attention_mask"][:, :-1],
+    img_features=None,
+    img_attn_mask=inputs["img_attn_mask"][:, :-1],
+    use_cache=True,
+    max_new_tokens=64,
+)
+generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
+processed_text = processor.post_processor_generation(generated_text, cleanup_and_extract=False)
+print(processed_text)
+processed_text, entities = processor.post_processor_generation(generated_text)
+print(processed_text)
+print(entities)
+```