Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
singhsidhukuldeepΒ 
posted an update Jun 16
Post
1667
πŸ–₯️ Do you have 1TB+ VRAM?

πŸŽ‰ Well, good news for you!

πŸ‘¨β€πŸ”¬ Good folks at @nvidia have released Nemotron 4 340B, the new open-source LLM king, rivalling GPT-4! πŸš€

πŸ“Š 340B parameter models in 3 flavours: base, reward, and instruct models

🎯 It's a dense model, not MoE

πŸ‘“ 4k context window

πŸ“š 9T tokens training data, 2 phase training (8T pre-train + 1T continued pre-training)

🌍 Trained on 50+ languages and 40+ coding languages (70% training data is English, 15% multi-lingual, 15% code)

πŸ“… June 2023 training data cut-off

πŸ’» To deploy needs 8x H200/ 16x H100/ 16x A100 80GB for BF16 Inference (about 8x H100 in int4)

πŸ† Of course, it beats Llama 3 70B on MMLU (81.1), Arena Hard (54.2), and GSM8K (92.4)

πŸ€– But beaten by Qwen 2 on HumanEval and MTBench which is a 72B parameter model

πŸ”§ Used SFT, DPO, and RPO. RLHF via Nemo Aligner framework to align the model

πŸ“Š 98% of alignment data was synthetically generated

πŸ“„ Nvidia open licence with commercial use allowed

Β―\_(ツ)_/Β―
πŸ˜… Glad to see more open models but this is one confusing fellow!
🀨340B parameter model that is narrowly beating 70B models? Starts failing against 72B models? Sounds like a model for synthetic data generation! But then it has 4k context?

πŸ”— Models: nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911

πŸ“‘ Paper: https://research.nvidia.com/publication/2024-06_nemotron-4-340b