issue about evaluate script
Hello,
Firstly, I want to extend my gratitude for your incredible work in the field of open-source NLP. Your contributions have been invaluable to the community.
I am reaching out to inquire if you could provide the zero-shot script used for testing the Jais-30B-chat model on MMLU-like Arabic datasets. I've been attempting to use the model for multiple-choice questions, but unfortunately, my attempts result in responses such as, "I'm sorry, I cannot provide the correct answer to multiple-choice questions. " For reference, I am using the 'prompt_ar' provided on your hf platform.
Additionally, I have another question regarding benchmark assessments for multiple-choice questions. When you conduct these benchmark evaluations, is the 'config' also set with 'do_sample=true'?
Any guidance or information you can provide would be greatly appreciated.
Best regards,