Image Classification
YoungZzz commited on
Commit
cfaa0d1
•
1 Parent(s): 110abc6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - image-classification
5
+ datasets:
6
+ - imagenet
7
+ ---
8
+
9
+ # VAN-Base
10
+
11
+ VAN is trained on ImageNet-1k (1 million images, 1,000 classes) at resolution 224x224. It was first introduced in the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) and first released in [here](https://github.com/Visual-Attention-Network).
12
+
13
+
14
+ ## Description
15
+
16
+ While originally designed for natural language processing (NLP) tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision. (1) Treating images as 1D sequences neglects their 2D structures. (2) The quadratic complexity is too expensive for high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel large kernel attention (LKA) module to enable self-adaptive and long-range correlations in self-attention while avoiding the above issues. We further introduce a novel neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple and efficient, VAN outperforms the state-of-the-art vision transformers (ViTs) and convolutional neural networks (CNNs) with a large margin in extensive experiments, including image classification, object detection, semantic segmentation, instance segmentation, etc.
17
+
18
+ ## Evaluation Results
19
+
20
+ | Model | #Params(M) | GFLOPs | Top1 Acc(%) | Download |
21
+ | :-------- | :--------: | :----: | :---------: | :----------------------------------------------------------: |
22
+ | VAN-Tiny | 4.1 | 0.9 | 75.4 |[Hugging Face 🤗](https://huggingface.co/Visual-Attention-Network/VAN-Tiny) |
23
+ | VAN-Small | 13.9 | 2.5 | 81.1 |[Hugging Face 🤗](https://huggingface.co/Visual-Attention-Network/VAN-Small) |
24
+ | VAN-Base | 26.6 | 5.0 | 82.8 |[Hugging Face 🤗](https://huggingface.co/Visual-Attention-Network/VAN-Base), |
25
+ | VAN-Large | 44.8 | 9.0 | 83.9 |[Hugging Face 🤗](https://huggingface.co/Visual-Attention-Network/VAN-Large) |
26
+
27
+
28
+ ### BibTeX entry and citation info
29
+ ```bibtex
30
+ @article{guo2022visual,
31
+ title={Visual Attention Network},
32
+ author={Guo, Meng-Hao and Lu, Cheng-Ze and Liu, Zheng-Ning and Cheng, Ming-Ming and Hu, Shi-Min},
33
+ journal={arXiv preprint arXiv:2202.09741},
34
+ year={2022}
35
+ }
36
+ ```