@singhsidhukuldeep on Hugging Face: "🚀 Good folks at @nvidia just dropped: "ChatQA 2: Bridging the Gap to…"

Post

702

🚀 Good folks at @nvidia just dropped: "ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities" 🧠💡

In the past few months, the open LLM community has made significant progress in releasing open models (Llama-3-70B-Instruct ( @Meta -AI) 🦙, QWen2-72BInstruct ( @AlibabaGroup ) 🌐, Nemotron-4-340B-Instruct ( @nvidia ) ⚙️, and Mixtral-8x22BInstruct-v0.1 ( @MistralAI ) 🌪️) that are at par with proprietary models! 📈

But top models like GPT-4 are still outperforming them in certain domains! 🔝💪

This led us to having domain-focused open-LLMs (DeepSeek-Coder-V2 for coding and math 👨‍💻➕, ChatQA 1.5 for conversational QA and retrieval-augmented generation (RAG) 💬🔍, and InternVL 1.5 for vision-language tasks 🖼️🗣️)

The challenge that ChatQA 2 focuses on is of context length and RAG! 📏🔗

These are the two capabilities essential for LLMs to process large volumes of information that cannot fit into a single prompt and are complementary to each other, depending on the downstream tasks and computational budgets. 🧩📊

The solution is a detailed continued training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens, along with a three-stage instruction tuning process to enhance the model's instruction-following, RAG performance, and long-context understanding capabilities. 🔄🔧

📄 Paper: ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities (2407.14482)

The interesting thing to notice from benchmarks was, how good QWen 2 is out of the box! 👏✨

Join the conversation