keras-io/mobile-vit-xxs · Hugging Face

Image Classification using MobileViT

This repo contains the model and the notebook to this Keras example on MobileViT.

Full credits to: Sayak Paul

Background Information

MobileViT architecture (Mehta et al.), combines the benefits of Transformers (Vaswani et al.) and convolutions. With Transformers, we can capture long-range dependencies that result in global representations. With convolutions, we can capture spatial relationships that model locality.

Besides combining the properties of Transformers and convolutions, the authors introduce MobileViT as a general-purpose mobile-friendly backbone for different image recognition tasks. Their findings suggest that, performance-wise, MobileViT is better than other models with the same or higher complexity (MobileNetV3, for example), while being efficient on mobile devices.

Training Data

The model is trained on a tf_flowers dataset

keras-io
/

mobile-vit-xxs

Image Classification using MobileViT

Background Information

Training Data

Spaces using keras-io/mobile-vit-xxs 3