Edit model card

This model is the product of curiosity—imagine a choice that allows you to label anime images!

Disclaimer: The model has been trained on an entirely new dataset. Predictions made by the model prior to 2023 might be off. It's advisable to fine-tune the model according to your specific use case.

Quick setup guide:

from transformers.modeling_outputs import ImageClassifierOutput
from transformers import ViTImageProcessor, ViTForImageClassification
import torch
from PIL import Image

model_name_or_path = "Ojimi/vit-anime-caption"
processor = ViTImageProcessor.from_pretrained(model_name_or_path)
model = ViTForImageClassification.from_pretrained(model_name_or_path)
threshold = 0.3

device = torch.device('cuda')

image = Image.open(YOUR_IMAGE_PATH)

inputs = processor(image, return_tensors='pt')

model.to(device=device)
model.eval()


with torch.no_grad():
    pixel_values = inputs['pixel_values'].to(device=device)

    outputs : ImageClassifierOutput = model(pixel_values=pixel_values)

    logits = outputs.logits  # The raw scores before applying any activation
    sigmoid = torch.nn.Sigmoid()  # Sigmoid function to convert logits to probabilities
    logits : torch.FloatTensor = sigmoid(logits)  # Applying sigmoid activation

    predictions = []  # List to store predictions

    for idx, p in enumerate(logits[0]):
        if p > threshold:  # Applying a threshold of 0.3 to consider a class prediction
            predictions.append((model.config.id2label[idx], p.item()))  # Storing class label and probability

for tag in predictions:
    print(tag)  

Why the Sigmoid?

  • Sigmoid turns boring scores into fun probabilities, so you can use thresholds and find more cool tags.
  • It's like a wizard turning regular stuff into magic potions!

Training guide

Downloads last month
29
Safetensors
Model size
89M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using Ojimi/vit-anime-caption 1