NeMo
English
nvidia
llama3.1

Cannot verify benchmark results

#4
by Lexski - opened

On the model card it says the model gets

AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98

and

Llama-3.1-Nemotron-70B-Instruct performs best on Arena Hard, AlpacaEval 2 LC (verified tab) and MT Bench (GPT-4-Turbo)

I tried following the links, but I cannot verify the results. The AlpacaEval 2.0 link indeed shows the leaderboard, but this Nemotron model does not appear on the leaderboard. The MT-Bench link takes me to a GitHub PR which doesn't mention GPT-4-Turbo or the Nemotron model.

Български разбираш ли

NVIDIA org

Those benchmarks were run internally so it's normal that you can't find those numbers online:

  • The AlpacaEval 2.0 link is for people to compare to the official leaderboard
  • The MT-Bench link is for people who may want to run this benchmark themselves, since it requires the changes from this PR

Sign up or log in to comment