uncritical but with anime styll with esy temporary fix
The model has a small issue where the model performs wurse with more steps compared to less steps:
50 steps:
prompt:
a huge fox with fluffy dark orange fur and nine tails walking in a forest. anime style
this issue only triggers in anime style and setting the steps to 25 fixes the issue.
i tested in on the int8 quantisation.
First of all, your prompt is relatively short, which directly poses a risk. In our README, it is mentioned that you should use longer prompts.
Additionally, the situation where 25 steps perform better than 50 steps is quite rare. In the 5B model, we use DPM instead of DDIM, which should theoretically allow generating a video of the same quality with fewer steps. Therefore, in theory, 30-40 steps should also be able to generate a video. However, it is not common for 25 steps to produce better quality than 50 steps. This is regardless of whether you used INT8 quantization or not.
for the prompt length issue. should i just use a LLM like GPT-4 to enhance the prompt
sure and the full code including converting is in our github here
https://github.com/THUDM/CogVideo/blob/main/inference/convert_demo.py
with fewshot prompt
the code in the repo for the prompt enhancing is a bit inefficient for token usage. using a fine tune would be cheaper