I can't make it work in Google Colab

#1
by QES - opened
Files changed (1) hide show
  1. README.md +37 -15
README.md CHANGED
@@ -17,8 +17,6 @@ Our generative model has `Next-DiT` as the backbone, the text encoder is the `Ge
17
 
18
  [paper](https://arxiv.org/abs/2405.05945)
19
 
20
- ![hero](https://github.com/Alpha-VLLM/Lumina-T2X/assets/54879512/9f52eabb-07dc-4881-8257-6d8a5f2a0a5a)
21
-
22
  ## πŸ“° News
23
 
24
  - [2024-06-08] πŸŽ‰πŸŽ‰πŸŽ‰ We have released the `Lumina-Next-SFT` model.
@@ -134,7 +132,7 @@ pip install -e .
134
  ⭐⭐ (Recommended) you can use huggingface_cli to download our model:
135
 
136
  ```bash
137
- huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-SFT --local-dir /path/to/ckpt
138
  ```
139
 
140
  or using git for cloning the model you want to use:
@@ -153,9 +151,9 @@ Update your own personal inference settings to generate different styles of imag
153
  - settings:
154
 
155
  model:
156
- ckpt: ""
157
- ckpt_lm: ""
158
- token: ""
159
 
160
  transport:
161
  path_type: "Linear" # option: ["Linear", "GVP", "VP"]
@@ -171,17 +169,41 @@ Update your own personal inference settings to generate different styles of imag
171
  likelihood: false # option: true or false
172
 
173
  infer:
174
- resolution: "1024x1024" # option: ["1024x1024", "512x2048", "2048x512", "(Extrapolation) 1664x1664", "(Extrapolation) 1024x2048", "(Extrapolation) 2048x1024"]
175
- num_sampling_steps: 60 # range: 1-1000
176
- cfg_scale: 4. # range: 1-20
177
- solver: "euler" # option: ["euler", "dopri5", "dopri8"]
178
- t_shift: 4 # range: 1-20 (int only)
179
- scaling_method: "Time-aware" # option: ["Time-aware", "None"]
180
- scale_watershed: 0.3 # range: 0.0-1.0
181
- proportional_attn: true # option: true or false
182
- seed: 0 # rnage: any number
183
  ```
184
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
185
  1. Run with CLI
186
 
187
  inference command:
 
17
 
18
  [paper](https://arxiv.org/abs/2405.05945)
19
 
 
 
20
  ## πŸ“° News
21
 
22
  - [2024-06-08] πŸŽ‰πŸŽ‰πŸŽ‰ We have released the `Lumina-Next-SFT` model.
 
132
  ⭐⭐ (Recommended) you can use huggingface_cli to download our model:
133
 
134
  ```bash
135
+ huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-T2I --local-dir /path/to/ckpt
136
  ```
137
 
138
  or using git for cloning the model you want to use:
 
151
  - settings:
152
 
153
  model:
154
+ ckpt: "/path/to/ckpt" # if ckpt is "", you should use `--ckpt` for passing model path when using `lumina` cli.
155
+ ckpt_lm: "" # if ckpt is "", you should use `--ckpt_lm` for passing model path when using `lumina` cli.
156
+ token: "" # if LLM is a huggingface gated repo, you should input your access token from huggingface and when token is "", you should `--token` for accessing the model.
157
 
158
  transport:
159
  path_type: "Linear" # option: ["Linear", "GVP", "VP"]
 
169
  likelihood: false # option: true or false
170
 
171
  infer:
172
+ resolution: "1024x1024" # option: ["1024x1024", "512x2048", "2048x512", "(Extrapolation) 1664x1664", "(Extrapolation) 1024x2048", "(Extrapolation) 2048x1024"]
173
+ num_sampling_steps: 60 # range: 1-1000
174
+ cfg_scale: 4. # range: 1-20
175
+ solver: "euler" # option: ["euler", "dopri5", "dopri8"]
176
+ t_shift: 4 # range: 1-20 (int only)
177
+ ntk_scaling: true # option: true or false
178
+ proportional_attn: true # option: true or false
179
+ seed: 0 # rnage: any number
 
180
  ```
181
 
182
+ - model:
183
+ - `ckpt`: lumina-next-t2i checkpoint path from [huggingface repo](https://huggingface.co/Alpha-VLLM/Lumina-Next-T2I) containing `consolidated*.pth` and `model_args.pth`.
184
+ - `ckpt_lm`: LLM checkpoint.
185
+ - `token`: huggingface access token for accessing gated repo.
186
+ - transport:
187
+ - `path_type`: the type of path for transport: 'Linear', 'GVP' (Geodesic Vector Pursuit), or 'VP' (Vector Pursuit).
188
+ - `prediction`: the prediction model for the transport dynamics.
189
+ - `loss_weight`: the weighting of different components in the loss function, can be 'velocity' for dynamic modeling, 'likelihood' for statistical consistency, or None for no weighting
190
+ - `sample_eps`: sampling in the transport model.
191
+ - `train_eps`: training to stabilize the learning process.
192
+ - ode:
193
+ - `atol`: Absolute tolerance for the ODE solver. (options: ["Linear", "GVP", "VP"])
194
+ - `rtol`: Relative tolerance for the ODE solver. (option: ["velocity", "score", "noise"])
195
+ - `reverse`: run the ODE solver in reverse. (option: [None, "velocity", "likelihood"])
196
+ - `likelihood`: Enable calculation of likelihood during the ODE solving process.
197
+ - infer
198
+ - `resolution`: generated image resolution.
199
+ - `num_sampling_steps`: sampling step for generating image.
200
+ - `cfg_scale`: classifier-free guide scaling factor
201
+ - `solver`: solver for image generation.
202
+ - `t_shift`: time shift factor.
203
+ - `ntk_scaling`: ntk rope scaling factor.
204
+ - `proportional_attn`: Whether to use proportional attention.
205
+ - `seed`: random initialization seeds.
206
+
207
  1. Run with CLI
208
 
209
  inference command: