File size: 1,566 Bytes
05ea7ec 6525661 05ea7ec 6525661 ea2a6a0 6525661 57d539d 6525661 ea2a6a0 6525661 ea2a6a0 f741f12 ea2a6a0 f741f12 ea2a6a0 6525661 ea2a6a0 634b17a c443fbc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
license: apache-2.0
pipeline_tag: image-text-to-text
---
Fine tuned version of moondream2 for prompt generation from images. Moondream is a small vision language model designed to run efficiently on edge devices. Check out the [GitHub repository](https://github.com/vikhyat/moondream) for details, or try it out on the [Hugging Face Space](https://huggingface.co/spaces/vikhyatk/moondream2)!
**Usage**
```bash
pip install transformers timm einops bitsandbytes accelerate
```
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from PIL import Image
DEVICE = "cuda"
DTYPE = torch.float32 if DEVICE == "cpu" else torch.float16 # CPU doesn't support float16
revision = "1082928d0aa39290a31a92f632ca670458eda512"
tokenizer = AutoTokenizer.from_pretrained("gokaygokay/moondream-prompt", revision=revision)
moondream = AutoModelForCausalLM.from_pretrained("gokaygokay/moondream-prompt",trust_remote_code=True,
torch_dtype=DTYPE, device_map={"": DEVICE}, revision=revision)
moondream.eval()
image_path = "<image_path>"
image = Image.open(image_path).convert("RGB")
md_answer = moondream.answer_question(
moondream.encode_image(image),
"Describe this image and its style in a very detailed manner",
tokenizer=tokenizer,
)
print(md_answer)
```
**Example**
![image/png](https://cdn-uploads.huggingface.co/production/uploads/630899601dd1e3075d975785/-x5jO3xnQrUz1uYO9SHji.png)
"a very angry old man with white hair and a mustache, in the style of a Pixar movie, hyperrealistic, white background, 8k"
|