THUDM/CogVideoX-5b · uncritical but with anime styll with esy temporary fix

Aug 30

The model has a small issue where the model performs wurse with more steps compared to less steps:
50 steps:

25 steps:

prompt:
a huge fox with fluffy dark orange fur and nine tails walking in a forest. anime style

this issue only triggers in anime style and setting the steps to 25 fixes the issue.
i tested in on the int8 quantisation.

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Aug 31

First of all, your prompt is relatively short, which directly poses a risk. In our README, it is mentioned that you should use longer prompts.

Additionally, the situation where 25 steps perform better than 50 steps is quite rare. In the 5B model, we use DPM instead of DDIM, which should theoretically allow generating a video of the same quality with fewer steps. Therefore, in theory, 30-40 steps should also be able to generate a video. However, it is not common for 25 steps to produce better quality than 50 steps. This is regardless of whether you used INT8 quantization or not.

TheBigBlockPC

Sep 1

for the prompt length issue. should i just use a LLM like GPT-4 to enhance the prompt

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Sep 1

sure and the full code including converting is in our github here
https://github.com/THUDM/CogVideo/blob/main/inference/convert_demo.py
with fewshot prompt

TheBigBlockPC

Sep 1

the code in the repo for the prompt enhancing is a bit inefficient for token usage. using a fine tune would be cheaper