Update README.md
Browse files
README.md
CHANGED
@@ -24,9 +24,9 @@ A series of CLIP ConvNeXt-XXLarge (a custom `timm` ConvNeXt size) models trained
|
|
24 |
|
25 |
| Model | Dataset | Resolution | AugReg | Top-1 ImageNet Zero-Shot (%) |
|
26 |
| ----- | ------- | ---------- | ------------ | --------- |
|
27 |
-
| [convnext_xxlarge.laion2b_s34b_b82k-augreg](CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg) | LAION-2B | 256x256 | RRC (0.33, 1.0), RE (0.35), SD (0.1) | 79.1 |
|
28 |
-
| [convnext_xxlarge.laion2b_s34b_b82k-augreg-rewind](CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-rewind) | LAION-2B | 256x256 | RRC (0.3, 1.0), RE (0.4), SD (0.1) | 79.3 |
|
29 |
-
| [convnext_xxlarge.laion2b_s34b_b82k-augreg-soup](CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-soup) | LAION-2B | 256x256 | N/A | 79.4 |
|
30 |
RRC = Random Resize Crop (crop pcts), RE = Random Erasing (prob), SD = Stochastic Depth (prob) -- image tower only
|
31 |
|
32 |
The core training run was performed in pieces over a period of ~ 2 months. The global batch size for the core run was 81920. The last ~10% of training was re-done at a 95744 global batch size w/ higher LR and aug than original finish. The two were averaged together in a 'soup'. See more details in [Training Details](#training-details).
|
@@ -111,9 +111,9 @@ Many difficulties w/ both model numerical stability and cluster stability and pe
|
|
111 |
|233 - 249 |Booster |1024 |256 |A100 40GB | 80 |51k | 50 |amp + bf16|0.98 |
|
112 |
|250 - 256 |Stability |1024 |128 |A100 40GB | 80 |27-31k | 26-30 |amp + bf16|0.98 |
|
113 |
|
114 |
-
JUWELS Booster has 4x A100 GPU per node w/ 4x HDR-200 IB adapters per node (200Gbit/sec per GPU). Stability setup used was 8x A100 GPU per node w/ 400Gbit/sec EFA
|
115 |
|
116 |
-
|
117 |
|
118 |
```
|
119 |
srun --cpu_bind=v --accel-bind=gn python -m training.main \
|
@@ -144,12 +144,13 @@ srun --cpu_bind=v --accel-bind=gn python -m training.main \
|
|
144 |
--report-to "tensorboard"
|
145 |
```
|
146 |
|
147 |
-
For the rewind of last 10%, a higher global batch size of 95744 was used w/ a higher LR and slightly increased augmentation strength.
|
148 |
|
149 |
|Checkpoint Interval |Cluster |# GPUs|# Nodes|GPU |local BS|sample/s|sample/s/gpu|precision |adam beta2 |
|
150 |
|--------------------|---------|------|-------|----------|--------|--------|------------|----------|-----------|
|
151 |
|231 - 256 |stability|1088 |136 |A100 40GB | 88 |32-35k | 29-32 |amp + bf16|0.98 |
|
152 |
|
|
|
153 |
```
|
154 |
srun --cpu_bind=v --accel-bind=gn python -m training.main \
|
155 |
--save-frequency 1 \
|
@@ -195,7 +196,7 @@ These models achieve between 79.1 and 79.4 top-1 zero-shot accuracy on ImageNet-
|
|
195 |
|
196 |
![](convnext_xxlarge_zero_shot.png)
|
197 |
|
198 |
-
|
199 |
|
200 |
![](convnext_xxlarge_zero_shot_zoom.png)
|
201 |
|
@@ -292,3 +293,20 @@ OpenAI CLIP paper
|
|
292 |
howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
|
293 |
}
|
294 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
| Model | Dataset | Resolution | AugReg | Top-1 ImageNet Zero-Shot (%) |
|
26 |
| ----- | ------- | ---------- | ------------ | --------- |
|
27 |
+
| [convnext_xxlarge.laion2b_s34b_b82k-augreg](https://huggingface.co/laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg) | LAION-2B | 256x256 | RRC (0.33, 1.0), RE (0.35), SD (0.1) | 79.1 |
|
28 |
+
| [convnext_xxlarge.laion2b_s34b_b82k-augreg-rewind](https://huggingface.co/laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-rewind) | LAION-2B | 256x256 | RRC (0.3, 1.0), RE (0.4), SD (0.1) | 79.3 |
|
29 |
+
| [convnext_xxlarge.laion2b_s34b_b82k-augreg-soup](https://huggingface.co/laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-soup) | LAION-2B | 256x256 | N/A | 79.4 |
|
30 |
RRC = Random Resize Crop (crop pcts), RE = Random Erasing (prob), SD = Stochastic Depth (prob) -- image tower only
|
31 |
|
32 |
The core training run was performed in pieces over a period of ~ 2 months. The global batch size for the core run was 81920. The last ~10% of training was re-done at a 95744 global batch size w/ higher LR and aug than original finish. The two were averaged together in a 'soup'. See more details in [Training Details](#training-details).
|
|
|
111 |
|233 - 249 |Booster |1024 |256 |A100 40GB | 80 |51k | 50 |amp + bf16|0.98 |
|
112 |
|250 - 256 |Stability |1024 |128 |A100 40GB | 80 |27-31k | 26-30 |amp + bf16|0.98 |
|
113 |
|
114 |
+
JUWELS Booster has 4x A100 GPU per node w/ 4x HDR-200 IB adapters per node (200Gbit/sec per GPU). Stability setup used was 8x A100 GPU per node w/ 400Gbit/sec EFA networking per node (50 GBit/sec per GPU). Significant variation in training efficiency (throughput per GPU) as observed across the various configurations. The 1024 GPU configurations across both clusters were particularly prone to crashing (or very difficult to get running w/ a 'good' set of GPUs).
|
115 |
|
116 |
+
A slurm srun command line below for a 128 8-GPU (40GB A100) configuration:
|
117 |
|
118 |
```
|
119 |
srun --cpu_bind=v --accel-bind=gn python -m training.main \
|
|
|
144 |
--report-to "tensorboard"
|
145 |
```
|
146 |
|
147 |
+
For the rewind of last 10%, a higher global batch size of 95744 was used w/ a higher LR and slightly increased augmentation strength.
|
148 |
|
149 |
|Checkpoint Interval |Cluster |# GPUs|# Nodes|GPU |local BS|sample/s|sample/s/gpu|precision |adam beta2 |
|
150 |
|--------------------|---------|------|-------|----------|--------|--------|------------|----------|-----------|
|
151 |
|231 - 256 |stability|1088 |136 |A100 40GB | 88 |32-35k | 29-32 |amp + bf16|0.98 |
|
152 |
|
153 |
+
The slurm srun command line for 136 8-GPU (40GB A100) nodes:
|
154 |
```
|
155 |
srun --cpu_bind=v --accel-bind=gn python -m training.main \
|
156 |
--save-frequency 1 \
|
|
|
196 |
|
197 |
![](convnext_xxlarge_zero_shot.png)
|
198 |
|
199 |
+
A zoom-in on final 10% w/ rewind:
|
200 |
|
201 |
![](convnext_xxlarge_zero_shot_zoom.png)
|
202 |
|
|
|
293 |
howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
|
294 |
}
|
295 |
```
|
296 |
+
|
297 |
+
```
|
298 |
+
@InProceedings{pmlr-v162-wortsman22a,
|
299 |
+
title = {Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time},
|
300 |
+
author = {Wortsman, Mitchell and Ilharco, Gabriel and Gadre, Samir Ya and Roelofs, Rebecca and Gontijo-Lopes, Raphael and Morcos, Ari S and Namkoong, Hongseok and Farhadi, Ali and Carmon, Yair and Kornblith, Simon and Schmidt, Ludwig},
|
301 |
+
booktitle = {Proceedings of the 39th International Conference on Machine Learning},
|
302 |
+
pages = {23965--23998},
|
303 |
+
year = {2022},
|
304 |
+
editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
|
305 |
+
volume = {162},
|
306 |
+
series = {Proceedings of Machine Learning Research},
|
307 |
+
month = {17--23 Jul},
|
308 |
+
publisher = {PMLR},
|
309 |
+
pdf = {https://proceedings.mlr.press/v162/wortsman22a/wortsman22a.pdf},
|
310 |
+
url = {https://proceedings.mlr.press/v162/wortsman22a.html}
|
311 |
+
}
|
312 |
+
```
|