60 15 299

David Golchinfar PRO

DavidGF

https://vago-solutions.ai

AI & ML interests

finetune llms, improve german language understanding and generated text of llms

Recent Activity

upvoted an article 1 day ago

New activity 1 day ago

LSX-UniWue/LLaMmlein_1B

liked a dataset 4 days ago

mlabonne/orca-agentinstruct-1M-v1-cleaned

Articles

SauerkrautLM's Multi-Phase Spectrum Training: A Technical Deep Dive

12 days ago

• 9

Organizations

Posts 5

Post

2963

🎉 Celebrating One Year of #SauerkrautLM with Two Groundbreaking Releases!

We're thrilled to announce the release of SauerkrautLM-v2-14b in two specialized versions: VAGOsolutions/SauerkrautLM-v2-14b-SFT and VAGOsolutions/SauerkrautLM-v2-14b-DPO. Built on the robust Qwen2.5-14B foundation, these models represent a significant leap forward in multilingual AI capabilities.

🔬 Technical Breakthroughs:
💠 Innovative three-phase Fine-Tuning approach
💠 Two-step Spectrum SFT + one-step Spectrum DPO optimization phase for enhanced performance
💠 Balance of German and English language capabilities
💠 Advanced function calling - almost on par with Claude-3.5-Sonnet-20240620

🇩🇪 German Language Excellence:
What sets this release apart is our unique achievement in simultaneously improving both German and English capabilities. Through our specialized training approach with over 1.2B tokens across two phases, we've managed to:
💠 Enhance German language understanding and generation (SFT Version > DPO Version)
💠 Maintain authentic German linguistic nuances
💠 Improve cross-lingual capabilities
💠 Preserve cultural context awareness

📊 Training Innovation:
Our three-phase approach targeted specific layer percentages (15%, 20% and 25%) with carefully curated datasets, including:
💠 Mathematics-focused content (proprietary classifier-selected)
💠 High-quality German training data
💠 Specialized function calling datasets
💠 Premium multilingual content

🎁 Community Contribution:
We're also releasing two new datasets in a few days:
1️⃣ SauerkrautLM-Fermented-GER-DPO: 3,300 high-quality German training samples
2️⃣ SauerkrautLM-Fermented-Irrelevance-GER-DPO: 2,000 specialized samples for optimized function call irrelevance handling

Thank you to our incredible community and partners who have supported us throughout this journey. Here's to another year of AI innovation! 🚀

Post

1432

Introducing Kraken-LoRA – a lightweight version of Kraken that uses LoRA-Adapters as Experts based on the base model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken-LoRA!

🔍 What’s the big deal?

✅ Size Consistency: While Kraken’s size increases with more Experts, Kraken-LoRA remains as compact as the base model (e.g., 8b if you use Meta-Llama3-8b-Instruct).
✅ VRAM Efficiency: Kraken-LoRA is highly VRAM efficient, maintaining the power of all experts without the bloat.
✅ Dynamic Adaptation: LoRA adapters are applied dynamically at runtime, following the routing process.
✅ High Efficiency: Enjoy increased efficiency without compromising performance, as long as the LoRA adapters match the base model.

💡 Conclusion: Kraken-LoRA empowers businesses to experience enhanced flexibility and performance from our architecture, enabling further scalability without sacrificing performance.

Check out the model here: VAGOsolutions/Kraken-LoRA
Explore the code here: https://github.com/cognitivecomputations/kraken/tree/main/Kraken-LoRA

Have fun with Kraken-LoRA! 🐙

View all posts

models

None public yet

datasets

None public yet