Alpha-VLLM
/

Lumina-Next-SFT

Text-to-Image

Safetensors

Model card Files Files and versions Community

I can't make it work in Google Colab

by QES - opened Jun 11

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+37

-15

This PR is in draft mode

Files changed (1) hide show

README.md +37 -15

README.md CHANGED Viewed

@@ -17,8 +17,6 @@ Our generative model has `Next-DiT` as the backbone, the text encoder is the `Ge
 [paper](https://arxiv.org/abs/2405.05945)
-![hero](https://github.com/Alpha-VLLM/Lumina-T2X/assets/54879512/9f52eabb-07dc-4881-8257-6d8a5f2a0a5a)
 ## 📰 News
 - [2024-06-08] 🎉🎉🎉 We have released the `Lumina-Next-SFT` model.
@@ -134,7 +132,7 @@ pip install -e .
 ⭐⭐ (Recommended) you can use huggingface_cli to download our model:
 ```bash
-huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-SFT --local-dir /path/to/ckpt
 ```
 or using git for cloning the model you want to use:
@@ -153,9 +151,9 @@ Update your own personal inference settings to generate different styles of imag
 - settings:
   model:
-    ckpt: ""
-    ckpt_lm: ""
-    token: ""
   transport:
     path_type: "Linear"             # option: ["Linear", "GVP", "VP"]
@@ -171,17 +169,41 @@ Update your own personal inference settings to generate different styles of imag
     likelihood: false               # option: true or false
   infer:
-      resolution: "1024x1024"       # option: ["1024x1024", "512x2048", "2048x512", "(Extrapolation) 1664x1664", "(Extrapolation) 1024x2048", "(Extrapolation) 2048x1024"]
-      num_sampling_steps: 60        # range: 1-1000
-      cfg_scale: 4.                 # range: 1-20
-      solver: "euler"               # option: ["euler", "dopri5", "dopri8"]
-      t_shift: 4                    # range: 1-20 (int only)
-      scaling_method: "Time-aware"  # option: ["Time-aware", "None"]
-      scale_watershed: 0.3          # range: 0.0-1.0
-      proportional_attn: true       # option: true or false
-      seed: 0                       # rnage: any number
 ```
 1. Run with CLI
 inference command:

 [paper](https://arxiv.org/abs/2405.05945)
 ## 📰 News
 - [2024-06-08] 🎉🎉🎉 We have released the `Lumina-Next-SFT` model.
 ⭐⭐ (Recommended) you can use huggingface_cli to download our model:
 ```bash
+huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-T2I --local-dir /path/to/ckpt
 ```
 or using git for cloning the model you want to use:
 - settings:
   model:
+    ckpt: "/path/to/ckpt"           # if ckpt is "", you should use `--ckpt` for passing model path when using `lumina` cli.
+    ckpt_lm: ""                     # if ckpt is "", you should use `--ckpt_lm` for passing model path when using `lumina` cli.
+    token: ""                       # if LLM is a huggingface gated repo, you should input your access token from huggingface and when token is "", you should `--token` for accessing the model.
   transport:
     path_type: "Linear"             # option: ["Linear", "GVP", "VP"]
     likelihood: false               # option: true or false
   infer:
+      resolution: "1024x1024"     # option: ["1024x1024", "512x2048", "2048x512", "(Extrapolation) 1664x1664", "(Extrapolation) 1024x2048", "(Extrapolation) 2048x1024"]
+      num_sampling_steps: 60      # range: 1-1000
+      cfg_scale: 4.               # range: 1-20
+      solver: "euler"             # option: ["euler", "dopri5", "dopri8"]
+      t_shift: 4                  # range: 1-20 (int only)
+      ntk_scaling: true           # option: true or false
+      proportional_attn: true     # option: true or false
+      seed: 0                     # rnage: any number
 ```
+- model:
+  - `ckpt`: lumina-next-t2i checkpoint path from [huggingface repo](https://huggingface.co/Alpha-VLLM/Lumina-Next-T2I) containing `consolidated*.pth` and `model_args.pth`.
+  - `ckpt_lm`: LLM checkpoint.
+  - `token`: huggingface access token for accessing gated repo.
+- transport:
+  - `path_type`: the type of path for transport: 'Linear', 'GVP' (Geodesic Vector Pursuit), or 'VP' (Vector Pursuit).
+  - `prediction`: the prediction model for the transport dynamics.
+  - `loss_weight`: the weighting of different components in the loss function, can be 'velocity' for dynamic modeling, 'likelihood' for statistical consistency, or None for no weighting
+  - `sample_eps`: sampling in the transport model.
+  - `train_eps`: training to stabilize the learning process.
+- ode:
+  - `atol`: Absolute tolerance for the ODE solver. (options: ["Linear", "GVP", "VP"])
+  - `rtol`: Relative tolerance for the ODE solver. (option: ["velocity", "score", "noise"])
+  - `reverse`: run the ODE solver in reverse. (option: [None, "velocity", "likelihood"])
+  - `likelihood`: Enable calculation of likelihood during the ODE solving process.
+- infer
+  - `resolution`: generated image resolution.
+  - `num_sampling_steps`: sampling step for generating image.
+  - `cfg_scale`: classifier-free guide scaling factor
+  - `solver`: solver for image generation.
+  - `t_shift`: time shift factor.
+  - `ntk_scaling`: ntk rope scaling factor.
+  - `proportional_attn`: Whether to use proportional attention.
+  - `seed`: random initialization seeds.
 1. Run with CLI
 inference command: