Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
singhsidhukuldeepΒ 
posted an update Jul 22
Post
702
πŸš€ Good folks at @nvidia just dropped: "ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities" πŸ§ πŸ’‘

In the past few months, the open LLM community has made significant progress in releasing open models (Llama-3-70B-Instruct ( @Meta -AI) πŸ¦™, QWen2-72BInstruct ( @AlibabaGroup ) 🌐, Nemotron-4-340B-Instruct ( @nvidia ) βš™οΈ, and Mixtral-8x22BInstruct-v0.1 ( @MistralAI ) πŸŒͺ️) that are at par with proprietary models! πŸ“ˆ

But top models like GPT-4 are still outperforming them in certain domains! πŸ”πŸ’ͺ

This led us to having domain-focused open-LLMs (DeepSeek-Coder-V2 for coding and math πŸ‘¨β€πŸ’»βž•, ChatQA 1.5 for conversational QA and retrieval-augmented generation (RAG) πŸ’¬πŸ”, and InternVL 1.5 for vision-language tasks πŸ–ΌοΈπŸ—£οΈ)

The challenge that ChatQA 2 focuses on is of context length and RAG! πŸ“πŸ”—

These are the two capabilities essential for LLMs to process large volumes of information that cannot fit into a single prompt and are complementary to each other, depending on the downstream tasks and computational budgets. πŸ§©πŸ“Š

The solution is a detailed continued training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens, along with a three-stage instruction tuning process to enhance the model's instruction-following, RAG performance, and long-context understanding capabilities. πŸ”„πŸ”§

πŸ“„ Paper: ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities (2407.14482)

The interesting thing to notice from benchmarks was, how good QWen 2 is out of the box! πŸ‘βœ¨