p1atdev
/

siglip-tagger-test-2

Image Classification

siglip_vision_model

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

p1atdev commited on Feb 3

Commit

3a31d34

•

1 Parent(s): 543e4a6

Update README.md

Files changed (1) hide show

README.md +57 -4

README.md CHANGED Viewed

@@ -2,6 +2,7 @@
 license: apache-2.0
 tags:
 - generated_from_trainer
 metrics:
 - accuracy
 - f1
@@ -9,6 +10,7 @@ base_model: google/siglip-base-patch16-512
 model-index:
 - name: siglip-tagger-test-2
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -24,15 +26,66 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -79,4 +132,4 @@ The following hyperparameters were used during training:
 - Transformers 4.37.2
 - Pytorch 2.1.2+cu118
 - Datasets 2.16.1
-- Tokenizers 0.15.0

 license: apache-2.0
 tags:
 - generated_from_trainer
+- siglip
 metrics:
 - accuracy
 - f1
 model-index:
 - name: siglip-tagger-test-2
   results: []
+pipeline_tag: image-classification
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 ## Model description
+This model is an experimental model that predicts danbooru tags of images.
+## Example
+```py
+from PIL import Image
+import torch
+from transformers import (
+    AutoModelForImageClassification,
+    AutoImageProcessor,
+)
+import numpy as np
+MODEL_NAME = "p1atdev/siglip-tagger-test-2"
+model = AutoModelForImageClassification.from_pretrained(
+    MODEL_NAME, torch_dtype=torch.bfloat16, trust_remote_code=True
+)
+model.eval()
+processor = AutoImageProcessor.from_pretrained(MODEL_NAME)
+image = Image.open("sample.jpg") # load your image
+inputs = processor(image, return_tensors="pt").to(model.device, model.dtype)
+logits = model(**inputs).logits.detach().cpu().float()[0]
+logits = np.clip(logits, 0.0, 1.0)
+results = {
+    model.config.id2label[i]: logit for i, logit in enumerate(logits) if logit > 0
+}
+results = sorted(results.items(), key=lambda x: x[1], reverse=True)
+for tag, score in results:
+    print(f"{tag}: {score*100:.2f}%")
+# 1girl: 100.00%
+# outdoors: 100.00%
+# sky: 100.00%
+# solo: 100.00%
+# school uniform: 96.88%
+# skirt: 92.97%
+# day: 89.06%
+# ...
+```
 ## Intended uses & limitations
+This model is for research use only and is not recommended for production.
+Please use wd-v1-4-tagger series by SmilingWolf:
+- [SmilingWolf/wd-v1-4-moat-tagger-v2](https://huggingface.co/SmilingWolf/wd-v1-4-moat-tagger-v2)
+- [SmilingWolf/wd-v1-4-swinv2-tagger-v2](https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2)
+etc.
 ## Training and evaluation data
+High quality 5000 images from danbooru. They were shulled and split into train:eval at 4500:500.
 ## Training procedure
 - Transformers 4.37.2
 - Pytorch 2.1.2+cu118
 - Datasets 2.16.1
+- Tokenizers 0.15.0