Post
Can we improve the quality of open LLMs for more languages?
Step 1: Evaluate current SOTA.
The Data Is Better Together community has rated more than 10K prompts for quality. We now want to translate a subset of these to help address the language gap in model evals.
The plan is roughly this:
- We started with DIBT/10k_prompts_ranked and took a subset of 500 high-quality prompts
- We're asking the community to translate these prompts into different languages
- We'll evaluate the extent to which we can use AlpacaEval and similar approaches to rate the outputs of models across these different languages
- If it works well, we can more easily evaluate open LLMs across different languages by using a judge LLM to rate the quality of outputs from different models.
You can find more details in our new GitHub repo: https://github.com/huggingface/data-is-better-together (don't forget to give it a ⭐!)
Step 1: Evaluate current SOTA.
The Data Is Better Together community has rated more than 10K prompts for quality. We now want to translate a subset of these to help address the language gap in model evals.
The plan is roughly this:
- We started with DIBT/10k_prompts_ranked and took a subset of 500 high-quality prompts
- We're asking the community to translate these prompts into different languages
- We'll evaluate the extent to which we can use AlpacaEval and similar approaches to rate the outputs of models across these different languages
- If it works well, we can more easily evaluate open LLMs across different languages by using a judge LLM to rate the quality of outputs from different models.
You can find more details in our new GitHub repo: https://github.com/huggingface/data-is-better-together (don't forget to give it a ⭐!)