silveroxides
commited on
Commit
•
77d1e20
1
Parent(s):
6178829
Upload folder using huggingface_hub
Browse files- .gitattributes +1 -0
- README.md +182 -0
- config.json +131 -0
- demo_cases.png +3 -0
- model.safetensors +3 -0
- special_tokens_map.json +36 -0
- tokenizer.json +0 -0
- tokenizer_config.json +440 -0
- vae/config.json +31 -0
- vae/diffusion_pytorch_model.safetensors +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
demo_cases.png filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,182 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
pipeline_tag: text-to-image
|
4 |
+
tags:
|
5 |
+
- image-to-image
|
6 |
+
---
|
7 |
+
|
8 |
+
<h1 align="center">OmniGen: Unified Image Generation</h1>
|
9 |
+
|
10 |
+
|
11 |
+
<p align="center">
|
12 |
+
<a href="">
|
13 |
+
<img alt="Build" src="https://img.shields.io/badge/Project%20Page-OmniGen-yellow">
|
14 |
+
</a>
|
15 |
+
<a href="https://arxiv.org/abs/2409.11340">
|
16 |
+
<img alt="Build" src="https://img.shields.io/badge/arXiv%20paper-2409.11340-b31b1b.svg">
|
17 |
+
</a>
|
18 |
+
<a href="https://huggingface.co/spaces/Shitao/OmniGen">
|
19 |
+
<img alt="License" src="https://img.shields.io/badge/HF%20Demo-🤗-lightblue">
|
20 |
+
</a>
|
21 |
+
<a href="https://huggingface.co/Shitao/OmniGen-v1">
|
22 |
+
<img alt="Build" src="https://img.shields.io/badge/HF%20Model-🤗-yellow">
|
23 |
+
</a>
|
24 |
+
</p>
|
25 |
+
|
26 |
+
<h4 align="center">
|
27 |
+
<p>
|
28 |
+
<a href=#2-news>News</a> |
|
29 |
+
<a href=#3-methodology>Methodology</a> |
|
30 |
+
<a href=#4-what-can-omnigen-do>Capabilities</a> |
|
31 |
+
<a href=#5-quick-start>Quick Start</a> |
|
32 |
+
<a href="#6-finetune">Finetune</a> |
|
33 |
+
<a href="#license">License</a> |
|
34 |
+
<a href="#citation">Citation</a>
|
35 |
+
<p>
|
36 |
+
</h4>
|
37 |
+
|
38 |
+
More information please refer to our github repo: https://github.com/VectorSpaceLab/OmniGen
|
39 |
+
|
40 |
+
## 1. Overview
|
41 |
+
|
42 |
+
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible and easy to use. We provide [inference code](#5-quick-start) so that everyone can explore more functionalities of OmniGen.
|
43 |
+
|
44 |
+
Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, **we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.**
|
45 |
+
|
46 |
+
Due to the limited resources, OmniGen still has room for improvement. We will continue to optimize it, and hope it inspire more universal image generation models. You can also easily fine-tune OmniGen without worrying about designing networks for specific tasks; you just need to prepare the corresponding data, and then run the [script](#6-finetune). Imagination is no longer limited; everyone can construct any image generation task, and perhaps we can achieve very interesting, wonderful and creative things.
|
47 |
+
|
48 |
+
If you have any questions, ideas or interesting tasks you want OmniGen to accomplish, feel free to discuss with us: [email protected], [email protected], [email protected]. We welcome any feedback to help us improve the model.
|
49 |
+
|
50 |
+
|
51 |
+
|
52 |
+
## 2. News
|
53 |
+
- 2024-10-22: :fire: We release the code for OmniGen. Inference: [docs/inference.md](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md) Train: [docs/fine-tuning.md](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/fine-tuning.md)
|
54 |
+
- 2024-10-22: :fire: We release the first version of OmniGen. Model Weight: [Shitao/OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1) HF Demo: [🤗](https://huggingface.co/spaces/Shitao/OmniGen)
|
55 |
+
|
56 |
+
|
57 |
+
|
58 |
+
## 3. Methodology
|
59 |
+
|
60 |
+
You can see details in our [paper](https://arxiv.org/abs/2409.11340).
|
61 |
+
|
62 |
+
|
63 |
+
## 4. What Can OmniGen do?
|
64 |
+
![demo](./demo_cases.png)
|
65 |
+
|
66 |
+
OmniGen is a unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. **OmniGen don't need additional plugins or operations, it can automatically identify the features (e.g., required object, human pose, depth mapping) in input images according the text prompt.**
|
67 |
+
We showcase some examples in [inference.ipynb](https://github.com/VectorSpaceLab/OmniGen/blob/main/inference.ipynb). And in [inference_demo.ipynb](https://github.com/VectorSpaceLab/OmniGen/blob/main/inference_demo.ipynb), we show a insteresting pipeline to generate and modify a image.
|
68 |
+
|
69 |
+
If you are not entirely satisfied with certain functionalities or wish to add new capabilities, you can try [fine-tuning OmniGen](#6-finetune).
|
70 |
+
|
71 |
+
|
72 |
+
|
73 |
+
## 5. Quick Start
|
74 |
+
|
75 |
+
|
76 |
+
### Using OmniGen
|
77 |
+
Install via Github(Recommend):
|
78 |
+
```bash
|
79 |
+
git clone https://github.com/staoxiao/OmniGen.git
|
80 |
+
cd OmniGen
|
81 |
+
pip install -e .
|
82 |
+
```
|
83 |
+
or via pypi:
|
84 |
+
```bash
|
85 |
+
pip install OmniGen
|
86 |
+
```
|
87 |
+
|
88 |
+
Here are some examples:
|
89 |
+
```python
|
90 |
+
from OmniGen import OmniGenPipeline
|
91 |
+
|
92 |
+
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
|
93 |
+
|
94 |
+
# Text to Image
|
95 |
+
images = pipe(
|
96 |
+
prompt="A curly-haired man in a red shirt is drinking tea.",
|
97 |
+
height=1024,
|
98 |
+
width=1024,
|
99 |
+
guidance_scale=2.5,
|
100 |
+
seed=0,
|
101 |
+
)
|
102 |
+
images[0].save("example_t2i.png") # save output PIL Image
|
103 |
+
|
104 |
+
# Multi-modal to Image
|
105 |
+
# In prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
|
106 |
+
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
|
107 |
+
images = pipe(
|
108 |
+
prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>."
|
109 |
+
input_images=["./imgs/test_cases/two_man.jpg"]
|
110 |
+
height=1024,
|
111 |
+
width=1024,
|
112 |
+
separate_cfg_infer=False, # if OOM, you can set separate_cfg_infer=True
|
113 |
+
guidance_scale=3,
|
114 |
+
img_guidance_scale=1.6
|
115 |
+
)
|
116 |
+
images[0].save("example_ti2i.png") # save output PIL image
|
117 |
+
```
|
118 |
+
For more details about the argument in inference, please refer to [docs/inference.md](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md).
|
119 |
+
For more examples for image generation, you can refer to [inference.ipynb](https://github.com/VectorSpaceLab/OmniGen/blob/main/inference.ipynb) and [inference_demo.ipynb](https://github.com/VectorSpaceLab/OmniGen/blob/main/inference_demo.ipynb)
|
120 |
+
|
121 |
+
|
122 |
+
### Using Diffusers
|
123 |
+
Coming soon.
|
124 |
+
|
125 |
+
|
126 |
+
### Gradio Demo
|
127 |
+
|
128 |
+
We construct an online demo in [Huggingface](https://huggingface.co/spaces/Shitao/OmniGen).
|
129 |
+
|
130 |
+
For the local gradio demo, you can run:
|
131 |
+
```python
|
132 |
+
python app.py
|
133 |
+
```
|
134 |
+
|
135 |
+
|
136 |
+
|
137 |
+
## 6. Finetune
|
138 |
+
We provide a training script `train.py` to fine-tune OmniGen.
|
139 |
+
Here is a toy example about LoRA finetune:
|
140 |
+
```bash
|
141 |
+
accelerate launch --num_processes=1 train.py \
|
142 |
+
--model_name_or_path Shitao/OmniGen-v1 \
|
143 |
+
--batch_size_per_device 2 \
|
144 |
+
--condition_dropout_prob 0.01 \
|
145 |
+
--lr 1e-3 \
|
146 |
+
--use_lora \
|
147 |
+
--lora_rank 8 \
|
148 |
+
--json_file ./toy_data/toy_subject_data.jsonl \
|
149 |
+
--image_path ./toy_data/images \
|
150 |
+
--max_input_length_limit 18000 \
|
151 |
+
--keep_raw_resolution \
|
152 |
+
--max_image_size 1024 \
|
153 |
+
--gradient_accumulation_steps 1 \
|
154 |
+
--ckpt_every 10 \
|
155 |
+
--epochs 200 \
|
156 |
+
--log_every 1 \
|
157 |
+
--results_dir ./results/toy_finetune_lora
|
158 |
+
```
|
159 |
+
|
160 |
+
Please refer to [docs/finetune.md](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/fine-tune.md) for more details (e.g. full finetune).
|
161 |
+
|
162 |
+
|
163 |
+
|
164 |
+
## License
|
165 |
+
This repo is licensed under the [MIT License](LICENSE).
|
166 |
+
|
167 |
+
|
168 |
+
## Citation
|
169 |
+
If you find this repository useful, please consider giving a star ⭐ and citation
|
170 |
+
```
|
171 |
+
@article{xiao2024omnigen,
|
172 |
+
title={Omnigen: Unified image generation},
|
173 |
+
author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
|
174 |
+
journal={arXiv preprint arXiv:2409.11340},
|
175 |
+
year={2024}
|
176 |
+
}
|
177 |
+
```
|
178 |
+
|
179 |
+
|
180 |
+
|
181 |
+
|
182 |
+
|
config.json
ADDED
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "Phi-3-vision-128k-instruct",
|
3 |
+
"architectures": [
|
4 |
+
"Phi3ForCausalLM"
|
5 |
+
],
|
6 |
+
"attention_dropout": 0.0,
|
7 |
+
"bos_token_id": 1,
|
8 |
+
"eos_token_id": 2,
|
9 |
+
"hidden_act": "silu",
|
10 |
+
"hidden_size": 3072,
|
11 |
+
"initializer_range": 0.02,
|
12 |
+
"intermediate_size": 8192,
|
13 |
+
"max_position_embeddings": 131072,
|
14 |
+
"model_type": "phi3",
|
15 |
+
"num_attention_heads": 32,
|
16 |
+
"num_hidden_layers": 32,
|
17 |
+
"num_key_value_heads": 32,
|
18 |
+
"original_max_position_embeddings": 4096,
|
19 |
+
"rms_norm_eps": 1e-05,
|
20 |
+
"rope_scaling": {
|
21 |
+
"long_factor": [
|
22 |
+
1.0299999713897705,
|
23 |
+
1.0499999523162842,
|
24 |
+
1.0499999523162842,
|
25 |
+
1.0799999237060547,
|
26 |
+
1.2299998998641968,
|
27 |
+
1.2299998998641968,
|
28 |
+
1.2999999523162842,
|
29 |
+
1.4499999284744263,
|
30 |
+
1.5999999046325684,
|
31 |
+
1.6499998569488525,
|
32 |
+
1.8999998569488525,
|
33 |
+
2.859999895095825,
|
34 |
+
3.68999981880188,
|
35 |
+
5.419999599456787,
|
36 |
+
5.489999771118164,
|
37 |
+
5.489999771118164,
|
38 |
+
9.09000015258789,
|
39 |
+
11.579999923706055,
|
40 |
+
15.65999984741211,
|
41 |
+
15.769999504089355,
|
42 |
+
15.789999961853027,
|
43 |
+
18.360000610351562,
|
44 |
+
21.989999771118164,
|
45 |
+
23.079999923706055,
|
46 |
+
30.009998321533203,
|
47 |
+
32.35000228881836,
|
48 |
+
32.590003967285156,
|
49 |
+
35.56000518798828,
|
50 |
+
39.95000457763672,
|
51 |
+
53.840003967285156,
|
52 |
+
56.20000457763672,
|
53 |
+
57.95000457763672,
|
54 |
+
59.29000473022461,
|
55 |
+
59.77000427246094,
|
56 |
+
59.920005798339844,
|
57 |
+
61.190006256103516,
|
58 |
+
61.96000671386719,
|
59 |
+
62.50000762939453,
|
60 |
+
63.3700065612793,
|
61 |
+
63.48000717163086,
|
62 |
+
63.48000717163086,
|
63 |
+
63.66000747680664,
|
64 |
+
63.850006103515625,
|
65 |
+
64.08000946044922,
|
66 |
+
64.760009765625,
|
67 |
+
64.80001068115234,
|
68 |
+
64.81001281738281,
|
69 |
+
64.81001281738281
|
70 |
+
],
|
71 |
+
"short_factor": [
|
72 |
+
1.05,
|
73 |
+
1.05,
|
74 |
+
1.05,
|
75 |
+
1.1,
|
76 |
+
1.1,
|
77 |
+
1.1,
|
78 |
+
1.2500000000000002,
|
79 |
+
1.2500000000000002,
|
80 |
+
1.4000000000000004,
|
81 |
+
1.4500000000000004,
|
82 |
+
1.5500000000000005,
|
83 |
+
1.8500000000000008,
|
84 |
+
1.9000000000000008,
|
85 |
+
2.000000000000001,
|
86 |
+
2.000000000000001,
|
87 |
+
2.000000000000001,
|
88 |
+
2.000000000000001,
|
89 |
+
2.000000000000001,
|
90 |
+
2.000000000000001,
|
91 |
+
2.000000000000001,
|
92 |
+
2.000000000000001,
|
93 |
+
2.000000000000001,
|
94 |
+
2.000000000000001,
|
95 |
+
2.000000000000001,
|
96 |
+
2.000000000000001,
|
97 |
+
2.000000000000001,
|
98 |
+
2.000000000000001,
|
99 |
+
2.000000000000001,
|
100 |
+
2.000000000000001,
|
101 |
+
2.000000000000001,
|
102 |
+
2.000000000000001,
|
103 |
+
2.000000000000001,
|
104 |
+
2.1000000000000005,
|
105 |
+
2.1000000000000005,
|
106 |
+
2.2,
|
107 |
+
2.3499999999999996,
|
108 |
+
2.3499999999999996,
|
109 |
+
2.3499999999999996,
|
110 |
+
2.3499999999999996,
|
111 |
+
2.3999999999999995,
|
112 |
+
2.3999999999999995,
|
113 |
+
2.6499999999999986,
|
114 |
+
2.6999999999999984,
|
115 |
+
2.8999999999999977,
|
116 |
+
2.9499999999999975,
|
117 |
+
3.049999999999997,
|
118 |
+
3.049999999999997,
|
119 |
+
3.049999999999997
|
120 |
+
],
|
121 |
+
"type": "su"
|
122 |
+
},
|
123 |
+
"rope_theta": 10000.0,
|
124 |
+
"sliding_window": 131072,
|
125 |
+
"tie_word_embeddings": false,
|
126 |
+
"torch_dtype": "bfloat16",
|
127 |
+
"transformers_version": "4.38.1",
|
128 |
+
"use_cache": true,
|
129 |
+
"vocab_size": 32064,
|
130 |
+
"_attn_implementation": "sdpa"
|
131 |
+
}
|
demo_cases.png
ADDED
Git LFS Details
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:80b7caa72c6628a30b76083208830c7b9f6d2debf8d63831414583a0a21dd395
|
3 |
+
size 15501299112
|
special_tokens_map.json
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"additional_special_tokens": [
|
3 |
+
"<|system|>",
|
4 |
+
"<|end|>",
|
5 |
+
"<|user|>",
|
6 |
+
"<|end|>"
|
7 |
+
],
|
8 |
+
"bos_token": {
|
9 |
+
"content": "<s>",
|
10 |
+
"lstrip": false,
|
11 |
+
"normalized": false,
|
12 |
+
"rstrip": false,
|
13 |
+
"single_word": false
|
14 |
+
},
|
15 |
+
"eos_token": {
|
16 |
+
"content": "<|endoftext|>",
|
17 |
+
"lstrip": false,
|
18 |
+
"normalized": false,
|
19 |
+
"rstrip": false,
|
20 |
+
"single_word": false
|
21 |
+
},
|
22 |
+
"pad_token": {
|
23 |
+
"content": "<|endoftext|>",
|
24 |
+
"lstrip": false,
|
25 |
+
"normalized": false,
|
26 |
+
"rstrip": false,
|
27 |
+
"single_word": false
|
28 |
+
},
|
29 |
+
"unk_token": {
|
30 |
+
"content": "<unk>",
|
31 |
+
"lstrip": false,
|
32 |
+
"normalized": false,
|
33 |
+
"rstrip": false,
|
34 |
+
"single_word": false
|
35 |
+
}
|
36 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,440 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": true,
|
3 |
+
"add_eos_token": false,
|
4 |
+
"added_tokens_decoder": {
|
5 |
+
"0": {
|
6 |
+
"content": "<unk>",
|
7 |
+
"lstrip": false,
|
8 |
+
"normalized": false,
|
9 |
+
"rstrip": false,
|
10 |
+
"single_word": false,
|
11 |
+
"special": true
|
12 |
+
},
|
13 |
+
"1": {
|
14 |
+
"content": "<s>",
|
15 |
+
"lstrip": false,
|
16 |
+
"normalized": false,
|
17 |
+
"rstrip": false,
|
18 |
+
"single_word": false,
|
19 |
+
"special": true
|
20 |
+
},
|
21 |
+
"2": {
|
22 |
+
"content": "</s>",
|
23 |
+
"lstrip": false,
|
24 |
+
"normalized": false,
|
25 |
+
"rstrip": true,
|
26 |
+
"single_word": false,
|
27 |
+
"special": false
|
28 |
+
},
|
29 |
+
"32000": {
|
30 |
+
"content": "<|endoftext|>",
|
31 |
+
"lstrip": false,
|
32 |
+
"normalized": false,
|
33 |
+
"rstrip": false,
|
34 |
+
"single_word": false,
|
35 |
+
"special": true
|
36 |
+
},
|
37 |
+
"32001": {
|
38 |
+
"content": "<|assistant|>",
|
39 |
+
"lstrip": false,
|
40 |
+
"normalized": false,
|
41 |
+
"rstrip": true,
|
42 |
+
"single_word": false,
|
43 |
+
"special": true
|
44 |
+
},
|
45 |
+
"32002": {
|
46 |
+
"content": "<|placeholder1|>",
|
47 |
+
"lstrip": false,
|
48 |
+
"normalized": false,
|
49 |
+
"rstrip": true,
|
50 |
+
"single_word": false,
|
51 |
+
"special": true
|
52 |
+
},
|
53 |
+
"32003": {
|
54 |
+
"content": "<|placeholder2|>",
|
55 |
+
"lstrip": false,
|
56 |
+
"normalized": false,
|
57 |
+
"rstrip": true,
|
58 |
+
"single_word": false,
|
59 |
+
"special": true
|
60 |
+
},
|
61 |
+
"32004": {
|
62 |
+
"content": "<|placeholder3|>",
|
63 |
+
"lstrip": false,
|
64 |
+
"normalized": false,
|
65 |
+
"rstrip": true,
|
66 |
+
"single_word": false,
|
67 |
+
"special": true
|
68 |
+
},
|
69 |
+
"32005": {
|
70 |
+
"content": "<|placeholder4|>",
|
71 |
+
"lstrip": false,
|
72 |
+
"normalized": false,
|
73 |
+
"rstrip": true,
|
74 |
+
"single_word": false,
|
75 |
+
"special": true
|
76 |
+
},
|
77 |
+
"32006": {
|
78 |
+
"content": "<|system|>",
|
79 |
+
"lstrip": false,
|
80 |
+
"normalized": false,
|
81 |
+
"rstrip": false,
|
82 |
+
"single_word": false,
|
83 |
+
"special": true
|
84 |
+
},
|
85 |
+
"32007": {
|
86 |
+
"content": "<|end|>",
|
87 |
+
"lstrip": false,
|
88 |
+
"normalized": false,
|
89 |
+
"rstrip": false,
|
90 |
+
"single_word": false,
|
91 |
+
"special": true
|
92 |
+
},
|
93 |
+
"32008": {
|
94 |
+
"content": "<|placeholder5|>",
|
95 |
+
"lstrip": false,
|
96 |
+
"normalized": false,
|
97 |
+
"rstrip": true,
|
98 |
+
"single_word": false,
|
99 |
+
"special": true
|
100 |
+
},
|
101 |
+
"32009": {
|
102 |
+
"content": "<|placeholder6|>",
|
103 |
+
"lstrip": false,
|
104 |
+
"normalized": false,
|
105 |
+
"rstrip": true,
|
106 |
+
"single_word": false,
|
107 |
+
"special": true
|
108 |
+
},
|
109 |
+
"32010": {
|
110 |
+
"content": "<|user|>",
|
111 |
+
"lstrip": false,
|
112 |
+
"normalized": false,
|
113 |
+
"rstrip": false,
|
114 |
+
"single_word": false,
|
115 |
+
"special": true
|
116 |
+
},
|
117 |
+
"32011": {
|
118 |
+
"content": "<|placeholder7|>",
|
119 |
+
"lstrip": false,
|
120 |
+
"normalized": false,
|
121 |
+
"rstrip": true,
|
122 |
+
"single_word": false,
|
123 |
+
"special": true
|
124 |
+
},
|
125 |
+
"32012": {
|
126 |
+
"content": "<|placeholder8|>",
|
127 |
+
"lstrip": false,
|
128 |
+
"normalized": false,
|
129 |
+
"rstrip": true,
|
130 |
+
"single_word": false,
|
131 |
+
"special": true
|
132 |
+
},
|
133 |
+
"32013": {
|
134 |
+
"content": "<|placeholder9|>",
|
135 |
+
"lstrip": false,
|
136 |
+
"normalized": false,
|
137 |
+
"rstrip": true,
|
138 |
+
"single_word": false,
|
139 |
+
"special": true
|
140 |
+
},
|
141 |
+
"32014": {
|
142 |
+
"content": "<|placeholder10|>",
|
143 |
+
"lstrip": false,
|
144 |
+
"normalized": false,
|
145 |
+
"rstrip": true,
|
146 |
+
"single_word": false,
|
147 |
+
"special": true
|
148 |
+
},
|
149 |
+
"32015": {
|
150 |
+
"content": "<|placeholder11|>",
|
151 |
+
"lstrip": false,
|
152 |
+
"normalized": false,
|
153 |
+
"rstrip": true,
|
154 |
+
"single_word": false,
|
155 |
+
"special": true
|
156 |
+
},
|
157 |
+
"32016": {
|
158 |
+
"content": "<|placeholder12|>",
|
159 |
+
"lstrip": false,
|
160 |
+
"normalized": false,
|
161 |
+
"rstrip": true,
|
162 |
+
"single_word": false,
|
163 |
+
"special": true
|
164 |
+
},
|
165 |
+
"32017": {
|
166 |
+
"content": "<|placeholder13|>",
|
167 |
+
"lstrip": false,
|
168 |
+
"normalized": false,
|
169 |
+
"rstrip": true,
|
170 |
+
"single_word": false,
|
171 |
+
"special": true
|
172 |
+
},
|
173 |
+
"32018": {
|
174 |
+
"content": "<|placeholder14|>",
|
175 |
+
"lstrip": false,
|
176 |
+
"normalized": false,
|
177 |
+
"rstrip": true,
|
178 |
+
"single_word": false,
|
179 |
+
"special": true
|
180 |
+
},
|
181 |
+
"32019": {
|
182 |
+
"content": "<|placeholder15|>",
|
183 |
+
"lstrip": false,
|
184 |
+
"normalized": false,
|
185 |
+
"rstrip": true,
|
186 |
+
"single_word": false,
|
187 |
+
"special": true
|
188 |
+
},
|
189 |
+
"32020": {
|
190 |
+
"content": "<|placeholder16|>",
|
191 |
+
"lstrip": false,
|
192 |
+
"normalized": false,
|
193 |
+
"rstrip": true,
|
194 |
+
"single_word": false,
|
195 |
+
"special": true
|
196 |
+
},
|
197 |
+
"32021": {
|
198 |
+
"content": "<|placeholder17|>",
|
199 |
+
"lstrip": false,
|
200 |
+
"normalized": false,
|
201 |
+
"rstrip": true,
|
202 |
+
"single_word": false,
|
203 |
+
"special": true
|
204 |
+
},
|
205 |
+
"32022": {
|
206 |
+
"content": "<|placeholder18|>",
|
207 |
+
"lstrip": false,
|
208 |
+
"normalized": false,
|
209 |
+
"rstrip": true,
|
210 |
+
"single_word": false,
|
211 |
+
"special": true
|
212 |
+
},
|
213 |
+
"32023": {
|
214 |
+
"content": "<|placeholder19|>",
|
215 |
+
"lstrip": false,
|
216 |
+
"normalized": false,
|
217 |
+
"rstrip": true,
|
218 |
+
"single_word": false,
|
219 |
+
"special": true
|
220 |
+
},
|
221 |
+
"32024": {
|
222 |
+
"content": "<|placeholder20|>",
|
223 |
+
"lstrip": false,
|
224 |
+
"normalized": false,
|
225 |
+
"rstrip": true,
|
226 |
+
"single_word": false,
|
227 |
+
"special": true
|
228 |
+
},
|
229 |
+
"32025": {
|
230 |
+
"content": "<|placeholder21|>",
|
231 |
+
"lstrip": false,
|
232 |
+
"normalized": false,
|
233 |
+
"rstrip": true,
|
234 |
+
"single_word": false,
|
235 |
+
"special": true
|
236 |
+
},
|
237 |
+
"32026": {
|
238 |
+
"content": "<|placeholder22|>",
|
239 |
+
"lstrip": false,
|
240 |
+
"normalized": false,
|
241 |
+
"rstrip": true,
|
242 |
+
"single_word": false,
|
243 |
+
"special": true
|
244 |
+
},
|
245 |
+
"32027": {
|
246 |
+
"content": "<|placeholder23|>",
|
247 |
+
"lstrip": false,
|
248 |
+
"normalized": false,
|
249 |
+
"rstrip": true,
|
250 |
+
"single_word": false,
|
251 |
+
"special": true
|
252 |
+
},
|
253 |
+
"32028": {
|
254 |
+
"content": "<|placeholder24|>",
|
255 |
+
"lstrip": false,
|
256 |
+
"normalized": false,
|
257 |
+
"rstrip": true,
|
258 |
+
"single_word": false,
|
259 |
+
"special": true
|
260 |
+
},
|
261 |
+
"32029": {
|
262 |
+
"content": "<|placeholder25|>",
|
263 |
+
"lstrip": false,
|
264 |
+
"normalized": false,
|
265 |
+
"rstrip": true,
|
266 |
+
"single_word": false,
|
267 |
+
"special": true
|
268 |
+
},
|
269 |
+
"32030": {
|
270 |
+
"content": "<|placeholder26|>",
|
271 |
+
"lstrip": false,
|
272 |
+
"normalized": false,
|
273 |
+
"rstrip": true,
|
274 |
+
"single_word": false,
|
275 |
+
"special": true
|
276 |
+
},
|
277 |
+
"32031": {
|
278 |
+
"content": "<|placeholder27|>",
|
279 |
+
"lstrip": false,
|
280 |
+
"normalized": false,
|
281 |
+
"rstrip": true,
|
282 |
+
"single_word": false,
|
283 |
+
"special": true
|
284 |
+
},
|
285 |
+
"32032": {
|
286 |
+
"content": "<|placeholder28|>",
|
287 |
+
"lstrip": false,
|
288 |
+
"normalized": false,
|
289 |
+
"rstrip": true,
|
290 |
+
"single_word": false,
|
291 |
+
"special": true
|
292 |
+
},
|
293 |
+
"32033": {
|
294 |
+
"content": "<|placeholder29|>",
|
295 |
+
"lstrip": false,
|
296 |
+
"normalized": false,
|
297 |
+
"rstrip": true,
|
298 |
+
"single_word": false,
|
299 |
+
"special": true
|
300 |
+
},
|
301 |
+
"32034": {
|
302 |
+
"content": "<|placeholder30|>",
|
303 |
+
"lstrip": false,
|
304 |
+
"normalized": false,
|
305 |
+
"rstrip": true,
|
306 |
+
"single_word": false,
|
307 |
+
"special": true
|
308 |
+
},
|
309 |
+
"32035": {
|
310 |
+
"content": "<|placeholder31|>",
|
311 |
+
"lstrip": false,
|
312 |
+
"normalized": false,
|
313 |
+
"rstrip": true,
|
314 |
+
"single_word": false,
|
315 |
+
"special": true
|
316 |
+
},
|
317 |
+
"32036": {
|
318 |
+
"content": "<|placeholder32|>",
|
319 |
+
"lstrip": false,
|
320 |
+
"normalized": false,
|
321 |
+
"rstrip": true,
|
322 |
+
"single_word": false,
|
323 |
+
"special": true
|
324 |
+
},
|
325 |
+
"32037": {
|
326 |
+
"content": "<|placeholder33|>",
|
327 |
+
"lstrip": false,
|
328 |
+
"normalized": false,
|
329 |
+
"rstrip": true,
|
330 |
+
"single_word": false,
|
331 |
+
"special": true
|
332 |
+
},
|
333 |
+
"32038": {
|
334 |
+
"content": "<|placeholder34|>",
|
335 |
+
"lstrip": false,
|
336 |
+
"normalized": false,
|
337 |
+
"rstrip": true,
|
338 |
+
"single_word": false,
|
339 |
+
"special": true
|
340 |
+
},
|
341 |
+
"32039": {
|
342 |
+
"content": "<|placeholder35|>",
|
343 |
+
"lstrip": false,
|
344 |
+
"normalized": false,
|
345 |
+
"rstrip": true,
|
346 |
+
"single_word": false,
|
347 |
+
"special": true
|
348 |
+
},
|
349 |
+
"32040": {
|
350 |
+
"content": "<|placeholder36|>",
|
351 |
+
"lstrip": false,
|
352 |
+
"normalized": false,
|
353 |
+
"rstrip": true,
|
354 |
+
"single_word": false,
|
355 |
+
"special": true
|
356 |
+
},
|
357 |
+
"32041": {
|
358 |
+
"content": "<|placeholder37|>",
|
359 |
+
"lstrip": false,
|
360 |
+
"normalized": false,
|
361 |
+
"rstrip": true,
|
362 |
+
"single_word": false,
|
363 |
+
"special": true
|
364 |
+
},
|
365 |
+
"32042": {
|
366 |
+
"content": "<|placeholder38|>",
|
367 |
+
"lstrip": false,
|
368 |
+
"normalized": false,
|
369 |
+
"rstrip": true,
|
370 |
+
"single_word": false,
|
371 |
+
"special": true
|
372 |
+
},
|
373 |
+
"32043": {
|
374 |
+
"content": "<|placeholder39|>",
|
375 |
+
"lstrip": false,
|
376 |
+
"normalized": false,
|
377 |
+
"rstrip": true,
|
378 |
+
"single_word": false,
|
379 |
+
"special": true
|
380 |
+
},
|
381 |
+
"32044": {
|
382 |
+
"content": "<|image|>",
|
383 |
+
"lstrip": false,
|
384 |
+
"normalized": false,
|
385 |
+
"rstrip": true,
|
386 |
+
"single_word": false,
|
387 |
+
"special": true
|
388 |
+
},
|
389 |
+
"32045": {
|
390 |
+
"content": "<img>",
|
391 |
+
"lstrip": false,
|
392 |
+
"normalized": false,
|
393 |
+
"rstrip": false,
|
394 |
+
"single_word": false,
|
395 |
+
"special": true
|
396 |
+
},
|
397 |
+
"32046": {
|
398 |
+
"content": "</img>",
|
399 |
+
"lstrip": false,
|
400 |
+
"normalized": false,
|
401 |
+
"rstrip": false,
|
402 |
+
"single_word": false,
|
403 |
+
"special": true
|
404 |
+
},
|
405 |
+
"32047": {
|
406 |
+
"content": "<cfg>",
|
407 |
+
"lstrip": false,
|
408 |
+
"normalized": false,
|
409 |
+
"rstrip": false,
|
410 |
+
"single_word": false,
|
411 |
+
"special": true
|
412 |
+
},
|
413 |
+
"32048": {
|
414 |
+
"content": "<|diffusion|>",
|
415 |
+
"lstrip": false,
|
416 |
+
"normalized": false,
|
417 |
+
"rstrip": false,
|
418 |
+
"single_word": false,
|
419 |
+
"special": true
|
420 |
+
}
|
421 |
+
},
|
422 |
+
"additional_special_tokens": [
|
423 |
+
"<|system|>",
|
424 |
+
"<|end|>",
|
425 |
+
"<|user|>",
|
426 |
+
"<|end|>"
|
427 |
+
],
|
428 |
+
"bos_token": "<s>",
|
429 |
+
"chat_template": "{% for message in messages %}{{'<|' + message['role'] + '|>' + '\n' + message['content'] + '<|end|>\n' }}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{- '<|assistant|>\n' -}}{% endif %}",
|
430 |
+
"clean_up_tokenization_spaces": false,
|
431 |
+
"eos_token": "<|endoftext|>",
|
432 |
+
"legacy": false,
|
433 |
+
"model_max_length": 131072,
|
434 |
+
"pad_token": "<|endoftext|>",
|
435 |
+
"padding_side": "right",
|
436 |
+
"sp_model_kwargs": {},
|
437 |
+
"tokenizer_class": "LlamaTokenizer",
|
438 |
+
"unk_token": "<unk>",
|
439 |
+
"use_default_system_prompt": false
|
440 |
+
}
|
vae/config.json
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_class_name": "AutoencoderKL",
|
3 |
+
"_diffusers_version": "0.18.0.dev0",
|
4 |
+
"_name_or_path": ".",
|
5 |
+
"act_fn": "silu",
|
6 |
+
"block_out_channels": [
|
7 |
+
128,
|
8 |
+
256,
|
9 |
+
512,
|
10 |
+
512
|
11 |
+
],
|
12 |
+
"down_block_types": [
|
13 |
+
"DownEncoderBlock2D",
|
14 |
+
"DownEncoderBlock2D",
|
15 |
+
"DownEncoderBlock2D",
|
16 |
+
"DownEncoderBlock2D"
|
17 |
+
],
|
18 |
+
"in_channels": 3,
|
19 |
+
"latent_channels": 4,
|
20 |
+
"layers_per_block": 2,
|
21 |
+
"norm_num_groups": 32,
|
22 |
+
"out_channels": 3,
|
23 |
+
"sample_size": 1024,
|
24 |
+
"scaling_factor": 0.13025,
|
25 |
+
"up_block_types": [
|
26 |
+
"UpDecoderBlock2D",
|
27 |
+
"UpDecoderBlock2D",
|
28 |
+
"UpDecoderBlock2D",
|
29 |
+
"UpDecoderBlock2D"
|
30 |
+
]
|
31 |
+
}
|
vae/diffusion_pytorch_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1598f3d24932bcfe6634e8b618ea1e30ab1d57f5aad13a6d2de446d2199f2341
|
3 |
+
size 334643268
|