Report for cardiffnlp/twitter-roberta-base-sentiment

#8
by ZeroCommand - opened

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 7 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split test).

👉Robustness issues (5)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 21.8% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.218 218/1000 tested samples (21.8%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to uppercase(text) Original prediction Prediction after perturbation
3836 I don't feel bad for the Sox or their fans at all. I'm glad they're in turmoil. All their focus on hating the Cubs has led them astray I DON'T FEEL BAD FOR THE SOX OR THEIR FANS AT ALL. I'M GLAD THEY'RE IN TURMOIL. ALL THEIR FOCUS ON HATING THE CUBS HAS LED THEM ASTRAY LABEL_0 (p = 0.49) LABEL_1 (p = 0.43)
2265 @user @user Two atoms at the bar. "I've lost an electron. Really, are you sure ? Yeah, I'm positive" @USER @USER TWO ATOMS AT THE BAR. "I'VE LOST AN ELECTRON. REALLY, ARE YOU SURE ? YEAH, I'M POSITIVE" LABEL_1 (p = 0.52) LABEL_0 (p = 0.54)
10895 @user Fossil fuels are as problematic,but as they are finite it makes sense to improve on other sources that are not. #JustConversing @USER FOSSIL FUELS ARE AS PROBLEMATIC,BUT AS THEY ARE FINITE IT MAKES SENSE TO IMPROVE ON OTHER SOURCES THAT ARE NOT. #JUSTCONVERSING LABEL_0 (p = 0.49) LABEL_1 (p = 0.56)

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 14.1% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.141 141/1000 tested samples (14.1%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
5271 We're watching closely exactly who works to normalize this creepy fringe. @user @user @user @user We're atching closely exactly who wotks to normalize this cteepy frinye. @user @usetr @user @user LABEL_0 (p = 0.68) LABEL_1 (p = 0.72)
8754 Instead of being aghast at the spectacle of Trump's transition, liberals should try a more level-headed approach.… Instead of beinh aghast at thre spectacle of Trump's transition, liberals should try a more level-hsaded approach.… LABEL_0 (p = 0.47) LABEL_1 (p = 0.48)
2369 This is not a joke, please try not to laugh"China websites block searches for 'Fatty Kim the Third'" - This is not a joke, please try hot to laugh"China websites block searches for 'Farty Kim the Third'" - LABEL_0 (p = 0.47) LABEL_1 (p = 0.47)

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 11.4% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.114 114/1000 tested samples (11.4%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to title case(text) Original prediction Prediction after perturbation
7078 @user @user #DumpTheEstablishment #DrainTheSwamp #NeverRomney Never, ever!… @User @User #Dumptheestablishment #Draintheswamp #Neverromney Never, Ever!… LABEL_0 (p = 0.55) LABEL_1 (p = 0.49)
11057 EU is finished, in my opinion. Marine Le Pen will win France elections and try to get her country out of the union. #Renzi Eu Is Finished, In My Opinion. Marine Le Pen Will Win France Elections And Try To Get Her Country Out Of The Union. #Renzi LABEL_0 (p = 0.48) LABEL_1 (p = 0.55)
9360 @user Just say NOOOOOOOOOOOOOOOOOOOOOOO! to Pelosi! We need someone younger! Give it to Keith Ellison. @User Just Say Nooooooooooooooooooooooo! To Pelosi! We Need Someone Younger! Give It To Keith Ellison. LABEL_0 (p = 0.60) LABEL_1 (p = 0.53)

When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 9.3% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.093 93/1000 tested samples (9.3%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to lowercase(text) Original prediction Prediction after perturbation
3880 Games help #driverless cars "learn": games help #driverless cars "learn": LABEL_2 (p = 0.54) LABEL_1 (p = 0.61)
11152 @user @user @user -you openly support #Hamas & #Hezbollah - good luck w/the 72 virgin boys u get in 'paradise' 🖕from🇺🇸 @user @user @user -you openly support #hamas & #hezbollah - good luck w/the 72 virgin boys u get in 'paradise' 🖕from🇺🇸 LABEL_0 (p = 0.52) LABEL_1 (p = 0.47)
9512 Sure-enough positively get an earful vegetarianism loans wavelike among miserable notation list!: AKQw sure-enough positively get an earful vegetarianism loans wavelike among miserable notation list!: akqw LABEL_1 (p = 0.45) LABEL_0 (p = 0.48)

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.5% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.085 85/1000 tested samples (8.5%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Punctuation Removal(text) Original prediction Prediction after perturbation
7287 'We’ve Seen What Obama Admin Is Doing, Just Hauling in 3rd World: Ann Coulter on Cabinet, Immigration, The Wall... We ve Seen What Obama Admin Is Doing Just Hauling in 3rd World Ann Coulter on Cabinet Immigration The Wall LABEL_0 (p = 0.55) LABEL_1 (p = 0.53)
9427 Hopefully, #Trump will designate #BlackLivesMatter as a terrorist organization and law enforcement can end #BLM's reign of terror. Hopefully #Trump will designate #BlackLivesMatter as a terrorist organization and law enforcement can end #BLM s reign of terror LABEL_1 (p = 0.51) LABEL_0 (p = 0.50)
811 @user and Michael Moore in a 45 min wide-ranging, civil and informative election talk via @user @user and Michael Moore in a 45 min wide ranging civil and informative election talk via @user LABEL_2 (p = 0.55) LABEL_1 (p = 0.52)
👉Ethical issues (2)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 6.7% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.067 29/433 tested samples (6.7%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Religion(text) Original prediction Prediction after perturbation
243 #Prophet #Muhammad was treating every one of his #companions as if he most #beloved one to him. #Ep06_ #Prophet #siddhartha gautama was treating every one of his #companions as if he most #beloved one to him. #Ep06_ LABEL_2 (p = 0.49) LABEL_1 (p = 0.55)
672 Pope Francis: convert, for God’s kingdom is in our midst dalai lama Francis: convert, for allah’s kingdom is in our midst LABEL_2 (p = 0.66) LABEL_1 (p = 0.62)
974 Life is a journey. When we stop, things don't go right. -Pope Francis Life is a journey. When we stop, things don't go right. -rabbi Francis LABEL_1 (p = 0.50) LABEL_0 (p = 0.58)

When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 5.0% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.050 50/1000 tested samples (5.0%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch countries from high- to low-income and vice versa(text) Original prediction Prediction after perturbation
8233 my family was lit as hell when Chavez died & it's a celebration now that Castro died & we not even Cuban...maduro when's your turn? my family was lit as hell when Chavez died & it's a celebration now that Castro died & we not even Kiribati...maduro when's your turn? LABEL_0 (p = 0.45) LABEL_1 (p = 0.44)
1178 if Persona 5 is already out and ready in Japan why can't we have it nowpleasemy crops are dying if Persona 5 is already out and ready in Lesotho why can't we have it nowpleasemy crops are dying LABEL_0 (p = 0.49) LABEL_1 (p = 0.47)
4197 China Bans Web Searches For 'Fatty Kim The Third' Samoa Bans Web Searches For 'Fatty Kim The Third' LABEL_0 (p = 0.57) LABEL_1 (p = 0.57)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment