add new data

Browse files

Files changed (13) hide show

README.md +6 -4
data/hotpotqa.e5_mistral_retriever_chunkbysents1200/test.json +3 -0
data/longbook_choice_eng.e5_mistral_retriever_chunkbysents1200/test.json +3 -0
data/longbook_choice_eng_gpt4_same/test.json +3 -0
data/longbook_qa_eng.e5_mistral_retriever_chunkbysents1200/test.json +3 -0
data/longbook_qa_eng_gpt4_same/test.json +3 -0
data/longbook_sum_eng_gpt4_same/test.json +3 -0
data/longdialogue_qa_eng_gpt4_same/test.json +3 -0
data/multifieldqa_en.e5_mistral_retriever_chunkbysents1200/test.json +3 -0
data/musique.e5_mistral_retriever_chunkbysents1200/test.json +3 -0
data/qasper.e5_mistral_retriever_chunkbysents1200/test.json +3 -0
data/qmsum.e5_mistral_retriever_chunkbysents1200/test.json +3 -0
data/quality.e5_mistral_retriever_chunkbysents1200/test.json +3 -0

README.md CHANGED Viewed

@@ -16,10 +16,10 @@ tags:
 We introduce Llama3-ChatQA-2, which bridges the gap between open-source LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-context understanding and retrieval-augmented generation (RAG) capabilities. Llama3-ChatQA-2 is developed using an improved training recipe from [ChatQA-1.5 paper](https://arxiv.org/pdf/2401.10225), and it is built on top of [Llama-3 base model](https://huggingface.co/meta-llama/Meta-Llama-3-70B). Specifically, we continued training of Llama-3 base models to extend the context window from 8K to 128K tokens, along with a three-stage instruction tuning process to enhance the model’s instruction-following, RAG performance, and long-context understanding capabilities. Llama3-ChatQA-2 has two variants: Llama3-ChatQA-2-8B and Llama3-ChatQA-2-70B. Both models were originally trained using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), we converted the checkpoints to Hugging Face format. **For more information about ChatQA 2, check the [website](https://chatqa2-project.github.io/)!**
 ## Other Resources
-[Llama3-ChatQA-2-70B](https://huggingface.co/nvidia/Llama3-ChatQA-2-70B) &ensp; [Evaluation Data](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) &ensp; [Training Data](https://huggingface.co/datasets/nvidia/ChatQA2-Training-Data) &ensp; [Retriever](https://huggingface.co/intfloat/e5-mistral-7b-instruct) &ensp; [Website](https://chatqa2-project.github.io/) &ensp; [Paper](https://arxiv.org/abs/2407.14482)
 ## Overview of Benchmark Results
-Results in [ChatRAG Bench](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) are as follows:
 ![Example Image](overview.png)
@@ -116,7 +116,9 @@ print(tokenizer.decode(response, skip_special_tokens=True))
 ```
 ## Command to run generation
 python evaluate_cqa_vllm_chatqa2.py --model-folder ${model_path} --eval-dataset ${dataset_name} --start-idx 0 --end-idx ${num_samples} --max-tokens ${max_tokens} --sample-input-file ${dataset_path}
 see all_command.sh for all detailed configuration.
@@ -135,5 +137,5 @@ Peng Xu ([email protected]), Wei Ping ([email protected])
 ## License
-The use of this model is governed by the [META LLAMA 3 COMMUNITY LICENSE AGREEMENT](https://llama.meta.com/llama3/license/)
-Also it is Non-Commercial License

 We introduce Llama3-ChatQA-2, which bridges the gap between open-source LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-context understanding and retrieval-augmented generation (RAG) capabilities. Llama3-ChatQA-2 is developed using an improved training recipe from [ChatQA-1.5 paper](https://arxiv.org/pdf/2401.10225), and it is built on top of [Llama-3 base model](https://huggingface.co/meta-llama/Meta-Llama-3-70B). Specifically, we continued training of Llama-3 base models to extend the context window from 8K to 128K tokens, along with a three-stage instruction tuning process to enhance the model’s instruction-following, RAG performance, and long-context understanding capabilities. Llama3-ChatQA-2 has two variants: Llama3-ChatQA-2-8B and Llama3-ChatQA-2-70B. Both models were originally trained using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), we converted the checkpoints to Hugging Face format. **For more information about ChatQA 2, check the [website](https://chatqa2-project.github.io/)!**
 ## Other Resources
+[Llama3-ChatQA-2-8B](https://huggingface.co/nvidia/Llama3-ChatQA-2-8B) &ensp; [Evaluation Data](https://huggingface.co/nvidia/Llama3-ChatQA-2-70B/tree/main/data) &ensp; [Training Data](https://huggingface.co/datasets/nvidia/ChatQA2-Long-SFT-data) &ensp; [Retriever](https://huggingface.co/intfloat/e5-mistral-7b-instruct) &ensp; [Website](https://chatqa2-project.github.io/) &ensp; [Paper](https://arxiv.org/abs/2407.14482)
 ## Overview of Benchmark Results
+<!-- Results in [ChatRAG Bench](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) are as follows: -->
 ![Example Image](overview.png)
 ```
 ## Command to run generation
+```
 python evaluate_cqa_vllm_chatqa2.py --model-folder ${model_path} --eval-dataset ${dataset_name} --start-idx 0 --end-idx ${num_samples} --max-tokens ${max_tokens} --sample-input-file ${dataset_path}
+```
 see all_command.sh for all detailed configuration.
 ## License
+The Model is released under Non-Commercial License and the use of this model is also governed by the [META LLAMA 3 COMMUNITY LICENSE AGREEMENT](https://llama.meta.com/llama3/license/)

data/hotpotqa.e5_mistral_retriever_chunkbysents1200/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8bdbe87695af0ffc9aafef2abbfd3753b9c49286a298265ab7d31cb53263b526
+size 23071838

data/longbook_choice_eng.e5_mistral_retriever_chunkbysents1200/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:408446c1f66282c22f5250f7bb34b61c3e138030456c37ff80aae8107a0bfb86
+size 373336874

data/longbook_choice_eng_gpt4_same/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:08c8424d91c27fb8cb00778f6790db51a42b554e51cece592f410dc1199ca47f
+size 302380927

data/longbook_qa_eng.e5_mistral_retriever_chunkbysents1200/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ed337ee7c70c6bf6f64c955ee17d8f59c001a4a2ca5b33056e749163592cbd61
+size 598663025

data/longbook_qa_eng_gpt4_same/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0fab613a11b84bc3812b44b38cca84d9142d63eee2e8d73b1f395ee37b0948ad
+size 476461679

data/longbook_sum_eng_gpt4_same/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4853bca9814332ec28e8920fe73be06daf61a3975fef4a20ee0fe705a1e9c2f8
+size 50591948

data/longdialogue_qa_eng_gpt4_same/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6bcff853f1d8c62c098a2bb176c6d7b0caec848d20543c95ffd965f43e4c7309
+size 83709847

data/multifieldqa_en.e5_mistral_retriever_chunkbysents1200/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a3607042f7620baced5ee63faa359a275ff4d4561d7b6d93a502bee4e414972
+size 9086582

data/musique.e5_mistral_retriever_chunkbysents1200/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2926144219b2a9df26b8a9ee551af2558ce92223637dbc910ead2fec37cedca7
+size 28347035

data/qasper.e5_mistral_retriever_chunkbysents1200/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:77001fab84bca90813cf07c4d2ba7f5ecf3a545e1b78719a3ea513e6d4b696b7
+size 80736248

data/qmsum.e5_mistral_retriever_chunkbysents1200/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d3c13449783630ee5ab648a485f0ebc8eff1bc810fa99e97d374166564882cbd
+size 32187716

data/quality.e5_mistral_retriever_chunkbysents1200/test.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d8ee44ed63c72a79476b5828c18799a1a4682915d56808861d99f09205a03450
+size 106451846