This model is not as good as "space JoyCaption Alpha Two"
After testing, I found that the performance of this model is not as good as "space JoyCaption Alpha Two". The error rate of this model is significantly higher than that of "space JoyCaption Alpha Two". Is this normal? To generate the best descriptive language, should I use "space JoyCaption Alpha Two"?
The model weights are the same, so I'm not sure why the performance you're seeing is lower. Could be some oddity with a dependency.
Thank you very much for your prompt reply. I tested an image with a pose that was difficult to recognize three times, and the result was that "space JoyCaption Alpha Two" was more accurate. It might be because I tested only a few times with limited content, and the model's outputs varied greatly each time, which led me to misjudge.