Freeway Animation HunYuan Demo Ver Release - Open-source dual-element base model for Chinese and English based on HunYuan
For detailed report, please visit: https://nx9nemngdhk.feishu.cn/docx/XNMDdCOkvoWlvVxVqfVceLXEnoh
I. Overview
Pleased to introduce Freeway Animation HunYuan Demo Ver on the new generation DiT architecture, a domestic dual-element model with superior performance and greater potential compared to previous SDXL models.Keywords: Improved human body expression, flexible Chinese and English keyword input methods, vivid composition, diverse styles, and lower training costs.Main motivations: More flexible keyword input and stable body structure for vivid compositions to facilitate novice user usage: providing multiple keyword input rules to output usable images based on user habits; more stable limb performance for better stability in large actions and multi-person interactive scenes. Maintaining high aesthetic standards for various anime art styles while keeping output appealing to general users. Contains extensive knowledge, eliminating the need for using characters/styles/artists' lore, facilitating better utilization of static models with acceleration techniques (such as TensorRT).The official version will include 6000+ characters, 1000+ styles, 2000+ action concepts, and the data is currently undergoing intensive cleaning ミ(・・)ミ
Instruction Guide
To avoid potential ambiguity in text prompts and leave space for complex scenes (e.g., multi-character scenes), enforcing sequence in prompts has been found to lead to better compliance behavior (learn from Novel AI V3 / Animagine3 / NetaXL / ArtiWaifu Diffusion!), while using Chinese descriptions and natural language to the extent possible while maintaining network alignment. Specifically, in Freeway Animation HunYuan, we recommend using the following four sequence combinations:
Similar to Novel AI prompt format:Tag sequence: Subject (1boy / 1girl) -> Character Name (a girl named frieren from sousou no frieren series) -> Artist (by xxx) -> race (elf) -> Camera Composition (cowboy shot) -> Style (impasto style) -> Scene Theme (fantasy theme) -> Main Environment Description (in the forest, at day) -> Background Description (gradient background) -> Action (sitting on ground) -> Expression (is expressionless) -> Main Character Feature (white hair) -> Body Feature (twintails, green eyes, parted lip) -> Attire Feature (wearing a white dress) -> Accessories (frills) -> Other Items (a cat) -> Secondary Scene (grass, sunshine) -> Aesthetics (beautiful color, detailed, aesthetic) -> Quality Words ((best quality:1.3))Note: Shuffle state between Artist and Quality Words, so strict adherence to sequence is not necessary in this partFormat like: 1 girl, anime style, cherry_blossom_pink_hair, ombre_hair, starry_eyes, sailor_uniform, elegant_outfit, crouching, feeding_cat
Chinese Natural Language Form:Regular short sentence format is fine, like: An anime girl with cherry blossom pink ombre long hair, bright starry eyes, dressed in an exquisite sailor uniform, is crouching down to feed a cat
English Natural Language Form:Also in short sentence format, like: An anime girl with cherry blossom pink ombre long hair, bright starry eyes, dressed in an exquisite sailor uniform, is crouching down to feed a cat.
Chinese Tag Form:Regular short sentence format is fine, like: 1 girl, anime style, cherry_blossom_pink_hair, ombre_hair, starry_eyes, sailor_uniform, elegant_outfit, crouching, feeding_cat.
Negative keyword prompts: Wrong eyes, bad facial expressions, disfigurement, poor art, deformation, unnecessary limbs, blurred color, blur, repetition, pathological, incomplete, watermark
Sampler parameters: Default recommendation is ddpm, 50+ steps and above for inference, 100 steps recommended for complex compositions.
II. Chinese Concept Understanding
Freeway Animation HunYuan demonstrates excellent ability in understanding Chinese keyword prompts, supporting various formats including natural language long sentences, short sentences, and even phrases. Be sure to explore various approaches.
III. Style Word Selection
Freeway Animation HunYuan uses a mixed style word design similar to Novel AI, allowing you to choose style activations with good orthogonality commonly used in many scenes. Artist styles also possess orthogonal styles, activated using the by xxx tag.
Having orthogonal styles means each style is unique from others, allowing for easy combinations to create new hybrid styles without interference, similar to Novel AI.
IV. Multi-person/Object Interaction Scenes
Here are a few samples of multi-character scenarios:
V. Expressions, Poses, and Composition Angles
Compared to SDXL models, the HunYuan model excels in maintaining human body rationality in complex compositions, with a significant improvement in its ability to understand and respond to prompts. It performs well even in challenging poses or camera angles (which may degrade other models), demonstrating impressive results in extreme compositions like dynamic views and upside down scenarios.
One noteworthy point is that the HunYuan model has development potential in multi-person interactions not inferior to Novel AI v3, with much lower development complexity than SDXL, which is why many enthusiasts choose the HunYuan model in the first place (laughs).
VI. Outlook and Acknowledgments
Future work:
Prepare a larger training set and more knowledge-based data to improve characters, styles, and detail handling.Develop new data processing algorithms and tools for a more detailed processing of datasets in the order of millions.Address the lack of support for Chinese and English character name mapping, propose solutions, and increase model usability.Welcome others to participate in discussions, provide suggestions, contribute to the model's progress, and welcome computing power sponsorship.
Freeway Animation HunYuan will be trained on 8*A100 devices. Please continue to follow us and test our products for free. Thank you for your attention and support!
Model developed by Laxhar Dream Lab - https://huggingface.co/Laxhar
Collaborating contributors:
L_A_X: https://civitai.com/user/L_A_X
https://www.liblib.art/userpage/9e1b16538b9657f2a737e9c2c6ebfa69
Chenkin: https://civitai.com/user/Chenkin
Nebulae https://civitai.com/user/kitarz
千千阙(^^)歌:https://civitai.com/user/li_li
Thanks and references: Tencent HunyuanDiT team: https://github.com/tencent/HunyuanDiT
Thanks to Tencent HunyuanDiT team for the open-source HunYuanDiT model and related training scripts and parameters. Training set sources:
Danbooru (Pid: 1~7,600,039): https://huggingface.co/datasets/KBlueLeaf/danbooru2023-webp-4Mpixel
Danbooru (Pid > 7,600,039): https://huggingface.co/datasets/deepghs/danbooru_newest-webp-4Mpixel
Thanks to both projects for providing Danbooru data for training.
Model evaluation: CCIP trained from: https://github.com/IrisRainbowNeko Prompt
Sequence layout provided by: https://github.com/shiertier
Sorting algorithm development assistance from: https://github.com/Yidhar
We sincerely thank Jerry (Narugo1992) and his DeepGHS team for their outstanding contributions to the open-source community. Their series of models and tools have been indispensable in building and processing our entire training set. Narugo1992: https://github.com/narugo1992
Deepghs: https://github.com/deepghs
https://blog.novelai.net/introducing-novelai-diffusion-anime-v3-6d00d1c118c3
Model category: Diffusion-generated text-to-image model
Usage license: TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT