Ross Wightman
commited on
Commit
•
10d04ca
1
Parent(s):
42234a7
Initial commit w/ README, config, weights
Browse files- README.md +103 -0
- config.json +32 -0
- pytorch_model.bin +3 -0
README.md
ADDED
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- image-classification
|
4 |
+
- timm
|
5 |
+
- normalization-free
|
6 |
+
- efficient-channel-attention
|
7 |
+
license: apache-2.0
|
8 |
+
datasets:
|
9 |
+
- imagenet
|
10 |
+
inference: false
|
11 |
+
---
|
12 |
+
|
13 |
+
# ECA-NFNet-L0
|
14 |
+
|
15 |
+
Pretrained model on [ImageNet](http://www.image-net.org/), this is a variant of the [NFNet (Normalization Free)](https://arxiv.org/abs/2102.06171) model family.
|
16 |
+
|
17 |
+
## Model description
|
18 |
+
|
19 |
+
This model variant was slimmed down from the original F0 variant in the paper for improved runtime characteristics (throughput, memory use) in PyTorch, on a GPU accelerator. It utilizes [Efficient Channel Attention (ECA)](https://arxiv.org/abs/1910.03151) instead of Squeeze-Excitation. It also features SiLU activations instead of the usual GELU.
|
20 |
+
|
21 |
+
Like other models in the NF family, this model contains no normalization layers (batch, group, etc). The models make use of [Weight Standardized](https://arxiv.org/abs/1903.10520) convolutions with additional scaling values in lieu of normalization layers.
|
22 |
+
|
23 |
+
## Intended uses & limitations
|
24 |
+
You can use the raw model to classify images along the 1,000 ImageNet labels, but you can also change its head
|
25 |
+
to fine-tune it on a downstream task (another classification task with different labels, image segmentation or
|
26 |
+
object detection, to name a few).
|
27 |
+
|
28 |
+
### How to use
|
29 |
+
You can use this model with the usual factory method in `timm`:
|
30 |
+
```python
|
31 |
+
import PIL
|
32 |
+
import timm
|
33 |
+
import torch
|
34 |
+
|
35 |
+
model = timm.create_model("timm/eca_nfnet_l0")
|
36 |
+
|
37 |
+
config = model.default_cfg
|
38 |
+
img_size = config["test_input_size"][-1] if "test_input_size" in config else config["input_size"][-1]
|
39 |
+
transform = timm.data.transforms_factory.transforms_imagenet_eval(
|
40 |
+
img_size=img_size,
|
41 |
+
interpolation=config["interpolation"],
|
42 |
+
mean=config["mean"],
|
43 |
+
std=config["std"],
|
44 |
+
crop_pct=config["crop_pct"],
|
45 |
+
)
|
46 |
+
|
47 |
+
img = PIL.Image.open(path_to_an_image)
|
48 |
+
img = img.convert("RGB")
|
49 |
+
input_tensor = transform(cat_img)
|
50 |
+
input_tensor = input_tensor.unsqueeze(0)
|
51 |
+
# ^ batch size = 1
|
52 |
+
with torch.no_grad():
|
53 |
+
output = model(input_tensor)
|
54 |
+
probs = output.squeeze(0).softmax(dim=0)
|
55 |
+
```
|
56 |
+
|
57 |
+
### Limitations and bias
|
58 |
+
The training images in the dataset are usually photos clearly representing one of the 1,000 labels. The model will
|
59 |
+
probably not generalize well on drawings or images containing multiple objects with different labels.
|
60 |
+
The training images in the dataset come mostly from the US (45.4%) and Great Britain (7.6%). As such the model or
|
61 |
+
models created by fine-tuning this model will work better on images picturing scenes from these countries (see
|
62 |
+
[this paper](https://arxiv.org/abs/1906.02659) for examples).
|
63 |
+
More generally, [recent research](https://arxiv.org/abs/2010.15052) has shown that even models trained in an
|
64 |
+
unsupervised fashion on ImageNet (i.e. without using the labels) will pick up racial and gender bias represented in
|
65 |
+
the training images.
|
66 |
+
|
67 |
+
## Training data
|
68 |
+
This model was pretrained on [ImageNet](http://www.image-net.org/), a dataset consisting of 14 millions of
|
69 |
+
hand-annotated images with 1,000 categories.
|
70 |
+
|
71 |
+
## Training procedure
|
72 |
+
For stability during training it is highly recommended to train all NFNet variants with gradient clipping enabled. This model was trained with an Adaptive Gradient Clipping (AGC) factor of 0.015 as described in [the paper](https://arxiv.org/abs/2102.06171). Similar to the paper, a cosine learning rate decay was employed using SGD w/ nesterov. Moderate to heavy augmentation ([RandAugment](https://arxiv.org/abs/1909.13719)) and regularization (dropout, stochastic depth) is recommended for training.
|
73 |
+
|
74 |
+
### Preprocessing
|
75 |
+
The images are resized using bicubic interpolation to 288x288 and normalized with the usual ImageNet statistics.
|
76 |
+
|
77 |
+
## Evaluation results
|
78 |
+
This model has a top1-accuracy of 82.6% and a top-5 accuracy of 96.5% on the ImageNet evaluation set.
|
79 |
+
|
80 |
+
### BibTeX entry and citation info
|
81 |
+
|
82 |
+
NFNet model architecture:
|
83 |
+
```bibtex
|
84 |
+
@article{brock2021high,
|
85 |
+
author={Andrew Brock and Soham De and Samuel L. Smith and Karen Simonyan},
|
86 |
+
title={High-Performance Large-Scale Image Recognition Without Normalization},
|
87 |
+
journal={arXiv preprint arXiv:2102.06171},
|
88 |
+
year={2021}
|
89 |
+
}
|
90 |
+
```
|
91 |
+
|
92 |
+
L0 model variant & pretraining:
|
93 |
+
```bibtex
|
94 |
+
@misc{rw2019timm,
|
95 |
+
author = {Ross Wightman},
|
96 |
+
title = {PyTorch Image Models},
|
97 |
+
year = {2019},
|
98 |
+
publisher = {GitHub},
|
99 |
+
journal = {GitHub repository},
|
100 |
+
doi = {10.5281/zenodo.4414861},
|
101 |
+
howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
|
102 |
+
}
|
103 |
+
```
|
config.json
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"num_classes": 1000,
|
3 |
+
"input_size": [
|
4 |
+
3,
|
5 |
+
224,
|
6 |
+
224
|
7 |
+
],
|
8 |
+
"pool_size": [
|
9 |
+
7,
|
10 |
+
7
|
11 |
+
],
|
12 |
+
"crop_pct": 1.0,
|
13 |
+
"interpolation": "bicubic",
|
14 |
+
"mean": [
|
15 |
+
0.485,
|
16 |
+
0.456,
|
17 |
+
0.406
|
18 |
+
],
|
19 |
+
"std": [
|
20 |
+
0.229,
|
21 |
+
0.224,
|
22 |
+
0.225
|
23 |
+
],
|
24 |
+
"first_conv": "stem.conv1",
|
25 |
+
"classifier": "head.fc",
|
26 |
+
"test_input_size": [
|
27 |
+
3,
|
28 |
+
288,
|
29 |
+
288
|
30 |
+
],
|
31 |
+
"architecture": "eca_nfnet_l0"
|
32 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:57290e127ff515f3cfafce1f69af2aa10cb7ecc8db7eafa0ecac3a906c198072
|
3 |
+
size 96611441
|