Intelligent vs Uncensored

#38
by Animus777 - opened

Hi! Thank you for testing all these models. I have a question though. When I'm looking at UGI leaderboard and see the model high up, I get kind of confused: "Is it because it's more smart than the others or is it because it's more uncensored?" So is there a way to evaluate general intelligence of a model without uncensoredness? Writing Style tab is about style so it's probably not a good metric for that and Anime Rating Prediction tab has a couple of Gemmas 2 2B in top 20 that are definitely not more intelligent than the big models.

Since pretty much all of the questions in my test set are uncensoredness focused, it's kinda hard to make a metric ranking raw intelligence without a model's censorship effecting its score. I did the best I could and averaged together the questions with the highest correlation with parameter size, which is probably the easiest and least biased measurement of intelligence. It's kinda janky, but it definitely creates a much larger divide between 2b, 8b, 70b, and 405b models.

DontPlanToEnd changed discussion status to closed

Sign up or log in to comment