π³ Arabic ORPO LLAMA 3
π Story first
This model is the a finetuned version of meta-llama/Meta-Llama-3-8B-Instruct using ORPO on 2A2I/argilla-dpo-mix-7k-arabic.
I wanted to try ORPO and see if it will better align a biased English model like llama3 to the arabic language or it will faill.
While the evaluations favour the base llama3 over my finetune, in practice i found my finetune was much better at spitting coherent (mostly correct) arabic text which i find interesting.
I would encourage everyone to try out the model from here and share his insights with me ^^
π€ Evaluation and Results
This result was made using lighteval with the community|arabic_mmlu tasks.
Community | Llama-3-8B-Instruct | Arabic-ORPO-Llama-3-8B-Instrcut |
---|---|---|
All | 0.348 | 0.317 |
Abstract Algebra | 0.310 | 0.230 |
Anatomy | 0.385 | 0.348 |
Astronomy | 0.388 | 0.316 |
Business Ethics | 0.480 | 0.370 |
Clinical Knowledge | 0.396 | 0.385 |
College Biology | 0.347 | 0.299 |
College Chemistry | 0.180 | 0.250 |
College Computer Science | 0.250 | 0.190 |
College Mathematics | 0.260 | 0.280 |
College Medicine | 0.231 | 0.249 |
College Physics | 0.225 | 0.216 |
Computer Security | 0.470 | 0.440 |
Conceptual Physics | 0.315 | 0.404 |
Econometrics | 0.263 | 0.272 |
Electrical Engineering | 0.414 | 0.359 |
Elementary Mathematics | 0.320 | 0.272 |
Formal Logic | 0.270 | 0.214 |
Global Facts | 0.320 | 0.320 |
High School Biology | 0.332 | 0.335 |
High School Chemistry | 0.256 | 0.296 |
High School Computer Science | 0.350 | 0.300 |
High School European History | 0.224 | 0.242 |
High School Geography | 0.323 | 0.364 |
High School Government & Politics | 0.352 | 0.285 |
High School Macroeconomics | 0.290 | 0.285 |
High School Mathematics | 0.237 | 0.278 |
High School Microeconomics | 0.231 | 0.273 |
High School Physics | 0.252 | 0.225 |
High School Psychology | 0.316 | 0.330 |
High School Statistics | 0.199 | 0.176 |
High School US History | 0.284 | 0.250 |
High School World History | 0.312 | 0.274 |
Human Aging | 0.369 | 0.430 |
Human Sexuality | 0.481 | 0.321 |
International Law | 0.603 | 0.405 |
Jurisprudence | 0.491 | 0.370 |
Logical Fallacies | 0.368 | 0.276 |
Machine Learning | 0.214 | 0.312 |
Management | 0.350 | 0.379 |
Marketing | 0.521 | 0.547 |
Medical Genetics | 0.320 | 0.330 |
Miscellaneous | 0.446 | 0.443 |
Moral Disputes | 0.422 | 0.306 |
Moral Scenarios | 0.248 | 0.241 |
Nutrition | 0.412 | 0.346 |
Philosophy | 0.408 | 0.328 |
Prehistory | 0.429 | 0.349 |
Professional Accounting | 0.344 | 0.273 |
Professional Law | 0.306 | 0.244 |
Professional Medicine | 0.228 | 0.206 |
Professional Psychology | 0.337 | 0.315 |
Public Relations | 0.391 | 0.373 |
Security Studies | 0.469 | 0.335 |
Sociology | 0.498 | 0.408 |
US Foreign Policy | 0.590 | 0.490 |
Virology | 0.422 | 0.416 |
World Religions | 0.404 | 0.304 |
Average (All Communities) | 0.348 | 0.317 |
- Downloads last month
- 6,903
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.