minor change
Browse files- description1.md +5 -5
description1.md
CHANGED
@@ -7,10 +7,10 @@ From our findings, we need approximately 1/3 memory under ideal conditions (F, B
|
|
7 |
Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
|
8 |
|
9 |
|
10 |
-
|
|
11 |
-
|
12 |
-
| Bubble Rate
|
13 |
-
| Activation Memory <br> (
|
14 |
|
15 |
|
16 |
-
Bubble Rate here is calculated as `1 - (F+B+W)*m / longest_stage_time`.
|
|
|
7 |
Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
|
8 |
|
9 |
|
10 |
+
| Method | 1F1B | V-Min | V-Half | V-ZB |
|
11 |
+
|------------------------------------------|-------|----------|----------| ---- |
|
12 |
+
| Bubble Rate <br> (assuming T_F=T_B=T_W) | ~ p/m | ~ 2p/3m | ~ p/ 2m | 0 |
|
13 |
+
| Activation Memory <br> (by #micro-batch) | p | (p+4)//3 | (p+2)//2 | p |
|
14 |
|
15 |
|
16 |
+
Bubble Rate here is calculated as `1 - (F+B+W)*m / longest_stage_time`.
|