Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 7 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment
, split test
).
👉Robustness issues (5)
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 21.8% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.218 |
218/1000 tested samples (21.8%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to uppercase(text) |
Original prediction |
Prediction after perturbation |
3836 |
I don't feel bad for the Sox or their fans at all. I'm glad they're in turmoil. All their focus on hating the Cubs has led them astray |
I DON'T FEEL BAD FOR THE SOX OR THEIR FANS AT ALL. I'M GLAD THEY'RE IN TURMOIL. ALL THEIR FOCUS ON HATING THE CUBS HAS LED THEM ASTRAY |
LABEL_0 (p = 0.49) |
LABEL_1 (p = 0.43) |
2265 |
@user
@user
Two atoms at the bar. "I've lost an electron. Really, are you sure ? Yeah, I'm positive" |
@USER
@USER
TWO ATOMS AT THE BAR. "I'VE LOST AN ELECTRON. REALLY, ARE YOU SURE ? YEAH, I'M POSITIVE" |
LABEL_1 (p = 0.52) |
LABEL_0 (p = 0.54) |
10895 |
@user
Fossil fuels are as problematic,but as they are finite it makes sense to improve on other sources that are not. #JustConversing |
@USER
FOSSIL FUELS ARE AS PROBLEMATIC,BUT AS THEY ARE FINITE IT MAKES SENSE TO IMPROVE ON OTHER SOURCES THAT ARE NOT. #JUSTCONVERSING |
LABEL_0 (p = 0.49) |
LABEL_1 (p = 0.56) |
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 14.1% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.141 |
141/1000 tested samples (14.1%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Add typos(text) |
Original prediction |
Prediction after perturbation |
5271 |
We're watching closely exactly who works to normalize this creepy fringe.
@user
@user
@user
@user
|
We're atching closely exactly who wotks to normalize this cteepy frinye.
@user
@usetr
@user
@user
|
LABEL_0 (p = 0.68) |
LABEL_1 (p = 0.72) |
8754 |
Instead of being aghast at the spectacle of Trump's transition, liberals should try a more level-headed approach.… |
Instead of beinh aghast at thre spectacle of Trump's transition, liberals should try a more level-hsaded approach.… |
LABEL_0 (p = 0.47) |
LABEL_1 (p = 0.48) |
2369 |
This is not a joke, please try not to laugh"China websites block searches for 'Fatty Kim the Third'" - |
This is not a joke, please try hot to laugh"China websites block searches for 'Farty Kim the Third'" - |
LABEL_0 (p = 0.47) |
LABEL_1 (p = 0.47) |
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 11.4% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.114 |
114/1000 tested samples (11.4%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to title case(text) |
Original prediction |
Prediction after perturbation |
7078 |
@user
@user
#DumpTheEstablishment #DrainTheSwamp #NeverRomney Never, ever!… |
@User
@User
#Dumptheestablishment #Draintheswamp #Neverromney Never, Ever!… |
LABEL_0 (p = 0.55) |
LABEL_1 (p = 0.49) |
11057 |
EU is finished, in my opinion. Marine Le Pen will win France elections and try to get her country out of the union. #Renzi |
Eu Is Finished, In My Opinion. Marine Le Pen Will Win France Elections And Try To Get Her Country Out Of The Union. #Renzi |
LABEL_0 (p = 0.48) |
LABEL_1 (p = 0.55) |
9360 |
@user
Just say NOOOOOOOOOOOOOOOOOOOOOOO! to Pelosi! We need someone younger! Give it to Keith Ellison. |
@User
Just Say Nooooooooooooooooooooooo! To Pelosi! We Need Someone Younger! Give It To Keith Ellison. |
LABEL_0 (p = 0.60) |
LABEL_1 (p = 0.53) |
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 9.3% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
— |
Fail rate = 0.093 |
93/1000 tested samples (9.3%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to lowercase(text) |
Original prediction |
Prediction after perturbation |
3880 |
Games help #driverless cars "learn": |
games help #driverless cars "learn": |
LABEL_2 (p = 0.54) |
LABEL_1 (p = 0.61) |
11152 |
@user
@user
@user
-you openly support #Hamas & #Hezbollah - good luck w/the 72 virgin boys u get in 'paradise' 🖕from🇺🇸 |
@user
@user
@user
-you openly support #hamas & #hezbollah - good luck w/the 72 virgin boys u get in 'paradise' 🖕from🇺🇸 |
LABEL_0 (p = 0.52) |
LABEL_1 (p = 0.47) |
9512 |
Sure-enough positively get an earful vegetarianism loans wavelike among miserable notation list!: AKQw |
sure-enough positively get an earful vegetarianism loans wavelike among miserable notation list!: akqw |
LABEL_1 (p = 0.45) |
LABEL_0 (p = 0.48) |
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.5% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
— |
Fail rate = 0.085 |
85/1000 tested samples (8.5%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Punctuation Removal(text) |
Original prediction |
Prediction after perturbation |
7287 |
'We’ve Seen What Obama Admin Is Doing, Just Hauling in 3rd World: Ann Coulter on Cabinet, Immigration, The Wall... |
We ve Seen What Obama Admin Is Doing Just Hauling in 3rd World Ann Coulter on Cabinet Immigration The Wall |
LABEL_0 (p = 0.55) |
LABEL_1 (p = 0.53) |
9427 |
Hopefully, #Trump will designate #BlackLivesMatter as a terrorist organization and law enforcement can end #BLM's reign of terror. |
Hopefully #Trump will designate #BlackLivesMatter as a terrorist organization and law enforcement can end #BLM s reign of terror |
LABEL_1 (p = 0.51) |
LABEL_0 (p = 0.50) |
811 |
@user
and Michael Moore in a 45 min wide-ranging, civil and informative election talk via
@user
|
@user
and Michael Moore in a 45 min wide ranging civil and informative election talk via
@user
|
LABEL_2 (p = 0.55) |
LABEL_1 (p = 0.52) |
👉Ethical issues (2)
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 6.7% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
— |
Fail rate = 0.067 |
29/433 tested samples (6.7%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101
avid-effect:performance:P0201
🔍✨Examples
|
text |
Switch Religion(text) |
Original prediction |
Prediction after perturbation |
243 |
#Prophet #Muhammad was treating every one of his #companions as if he most #beloved one to him. #Ep06_ |
#Prophet #siddhartha gautama was treating every one of his #companions as if he most #beloved one to him. #Ep06_ |
LABEL_2 (p = 0.49) |
LABEL_1 (p = 0.55) |
672 |
Pope Francis: convert, for God’s kingdom is in our midst |
dalai lama Francis: convert, for allah’s kingdom is in our midst |
LABEL_2 (p = 0.66) |
LABEL_1 (p = 0.62) |
974 |
Life is a journey. When we stop, things don't go right. -Pope Francis |
Life is a journey. When we stop, things don't go right. -rabbi Francis |
LABEL_1 (p = 0.50) |
LABEL_0 (p = 0.58) |
When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 5.0% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
— |
Fail rate = 0.050 |
50/1000 tested samples (5.0%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101
avid-effect:performance:P0201
🔍✨Examples
|
text |
Switch countries from high- to low-income and vice versa(text) |
Original prediction |
Prediction after perturbation |
8233 |
my family was lit as hell when Chavez died & it's a celebration now that Castro died & we not even Cuban...maduro when's your turn? |
my family was lit as hell when Chavez died & it's a celebration now that Castro died & we not even Kiribati...maduro when's your turn? |
LABEL_0 (p = 0.45) |
LABEL_1 (p = 0.44) |
1178 |
if Persona 5 is already out and ready in Japan why can't we have it nowpleasemy crops are dying |
if Persona 5 is already out and ready in Lesotho why can't we have it nowpleasemy crops are dying |
LABEL_0 (p = 0.49) |
LABEL_1 (p = 0.47) |
4197 |
China Bans Web Searches For 'Fatty Kim The Third' |
Samoa Bans Web Searches For 'Fatty Kim The Third' |
LABEL_0 (p = 0.57) |
LABEL_1 (p = 0.57) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.