rwightman HF staff commited on
Commit
0b91920
1 Parent(s): bd0c5a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -7
README.md CHANGED
@@ -24,9 +24,9 @@ A series of CLIP ConvNeXt-XXLarge (a custom `timm` ConvNeXt size) models trained
24
 
25
  | Model | Dataset | Resolution | AugReg | Top-1 ImageNet Zero-Shot (%) |
26
  | ----- | ------- | ---------- | ------------ | --------- |
27
- | [convnext_xxlarge.laion2b_s34b_b82k-augreg](CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg) | LAION-2B | 256x256 | RRC (0.33, 1.0), RE (0.35), SD (0.1) | 79.1 |
28
- | [convnext_xxlarge.laion2b_s34b_b82k-augreg-rewind](CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-rewind) | LAION-2B | 256x256 | RRC (0.3, 1.0), RE (0.4), SD (0.1) | 79.3 |
29
- | [convnext_xxlarge.laion2b_s34b_b82k-augreg-soup](CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-soup) | LAION-2B | 256x256 | N/A | 79.4 |
30
  RRC = Random Resize Crop (crop pcts), RE = Random Erasing (prob), SD = Stochastic Depth (prob) -- image tower only
31
 
32
  The core training run was performed in pieces over a period of ~ 2 months. The global batch size for the core run was 81920. The last ~10% of training was re-done at a 95744 global batch size w/ higher LR and aug than original finish. The two were averaged together in a 'soup'. See more details in [Training Details](#training-details).
@@ -111,9 +111,9 @@ Many difficulties w/ both model numerical stability and cluster stability and pe
111
  |233 - 249 |Booster |1024 |256 |A100 40GB | 80 |51k | 50 |amp + bf16|0.98 |
112
  |250 - 256 |Stability |1024 |128 |A100 40GB | 80 |27-31k | 26-30 |amp + bf16|0.98 |
113
 
114
- JUWELS Booster has 4x A100 GPU per node w/ 4x HDR-200 IB adapters per node (200Gbit/sec per GPU). Stability setup used was 8x A100 GPU per node w/ 400Gbit/sec EFA connectivity per node (~50 GBit/sec per GPU). Significant variation in training efficiency (throughput per GPU) as observed across the various configurations. The 1024 GPU configurations across both clusters were particularly prone to crashing (or very difficult to get running w/ a 'good' set of GPUs).
115
 
116
- For 256x256 models, a slurm script w/ srun below for a 128 8-GPU (40GB A100) configuration:
117
 
118
  ```
119
  srun --cpu_bind=v --accel-bind=gn python -m training.main \
@@ -144,12 +144,13 @@ srun --cpu_bind=v --accel-bind=gn python -m training.main \
144
  --report-to "tensorboard"
145
  ```
146
 
147
- For the rewind of last 10%, a higher global batch size of 95744 was used w/ a higher LR and slightly increased augmentation strength. The slurm srun cmd for 136 8-GPU (40GB A100) nodes:
148
 
149
  |Checkpoint Interval |Cluster |# GPUs|# Nodes|GPU |local BS|sample/s|sample/s/gpu|precision |adam beta2 |
150
  |--------------------|---------|------|-------|----------|--------|--------|------------|----------|-----------|
151
  |231 - 256 |stability|1088 |136 |A100 40GB | 88 |32-35k | 29-32 |amp + bf16|0.98 |
152
 
 
153
  ```
154
  srun --cpu_bind=v --accel-bind=gn python -m training.main \
155
  --save-frequency 1 \
@@ -195,7 +196,7 @@ These models achieve between 79.1 and 79.4 top-1 zero-shot accuracy on ImageNet-
195
 
196
  ![](convnext_xxlarge_zero_shot.png)
197
 
198
- Zoom:
199
 
200
  ![](convnext_xxlarge_zero_shot_zoom.png)
201
 
@@ -292,3 +293,20 @@ OpenAI CLIP paper
292
  howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
293
  }
294
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  | Model | Dataset | Resolution | AugReg | Top-1 ImageNet Zero-Shot (%) |
26
  | ----- | ------- | ---------- | ------------ | --------- |
27
+ | [convnext_xxlarge.laion2b_s34b_b82k-augreg](https://huggingface.co/laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg) | LAION-2B | 256x256 | RRC (0.33, 1.0), RE (0.35), SD (0.1) | 79.1 |
28
+ | [convnext_xxlarge.laion2b_s34b_b82k-augreg-rewind](https://huggingface.co/laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-rewind) | LAION-2B | 256x256 | RRC (0.3, 1.0), RE (0.4), SD (0.1) | 79.3 |
29
+ | [convnext_xxlarge.laion2b_s34b_b82k-augreg-soup](https://huggingface.co/laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-soup) | LAION-2B | 256x256 | N/A | 79.4 |
30
  RRC = Random Resize Crop (crop pcts), RE = Random Erasing (prob), SD = Stochastic Depth (prob) -- image tower only
31
 
32
  The core training run was performed in pieces over a period of ~ 2 months. The global batch size for the core run was 81920. The last ~10% of training was re-done at a 95744 global batch size w/ higher LR and aug than original finish. The two were averaged together in a 'soup'. See more details in [Training Details](#training-details).
 
111
  |233 - 249 |Booster |1024 |256 |A100 40GB | 80 |51k | 50 |amp + bf16|0.98 |
112
  |250 - 256 |Stability |1024 |128 |A100 40GB | 80 |27-31k | 26-30 |amp + bf16|0.98 |
113
 
114
+ JUWELS Booster has 4x A100 GPU per node w/ 4x HDR-200 IB adapters per node (200Gbit/sec per GPU). Stability setup used was 8x A100 GPU per node w/ 400Gbit/sec EFA networking per node (50 GBit/sec per GPU). Significant variation in training efficiency (throughput per GPU) as observed across the various configurations. The 1024 GPU configurations across both clusters were particularly prone to crashing (or very difficult to get running w/ a 'good' set of GPUs).
115
 
116
+ A slurm srun command line below for a 128 8-GPU (40GB A100) configuration:
117
 
118
  ```
119
  srun --cpu_bind=v --accel-bind=gn python -m training.main \
 
144
  --report-to "tensorboard"
145
  ```
146
 
147
+ For the rewind of last 10%, a higher global batch size of 95744 was used w/ a higher LR and slightly increased augmentation strength.
148
 
149
  |Checkpoint Interval |Cluster |# GPUs|# Nodes|GPU |local BS|sample/s|sample/s/gpu|precision |adam beta2 |
150
  |--------------------|---------|------|-------|----------|--------|--------|------------|----------|-----------|
151
  |231 - 256 |stability|1088 |136 |A100 40GB | 88 |32-35k | 29-32 |amp + bf16|0.98 |
152
 
153
+ The slurm srun command line for 136 8-GPU (40GB A100) nodes:
154
  ```
155
  srun --cpu_bind=v --accel-bind=gn python -m training.main \
156
  --save-frequency 1 \
 
196
 
197
  ![](convnext_xxlarge_zero_shot.png)
198
 
199
+ A zoom-in on final 10% w/ rewind:
200
 
201
  ![](convnext_xxlarge_zero_shot_zoom.png)
202
 
 
293
  howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
294
  }
295
  ```
296
+
297
+ ```
298
+ @InProceedings{pmlr-v162-wortsman22a,
299
+ title = {Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time},
300
+ author = {Wortsman, Mitchell and Ilharco, Gabriel and Gadre, Samir Ya and Roelofs, Rebecca and Gontijo-Lopes, Raphael and Morcos, Ari S and Namkoong, Hongseok and Farhadi, Ali and Carmon, Yair and Kornblith, Simon and Schmidt, Ludwig},
301
+ booktitle = {Proceedings of the 39th International Conference on Machine Learning},
302
+ pages = {23965--23998},
303
+ year = {2022},
304
+ editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
305
+ volume = {162},
306
+ series = {Proceedings of Machine Learning Research},
307
+ month = {17--23 Jul},
308
+ publisher = {PMLR},
309
+ pdf = {https://proceedings.mlr.press/v162/wortsman22a/wortsman22a.pdf},
310
+ url = {https://proceedings.mlr.press/v162/wortsman22a.html}
311
+ }
312
+ ```