@singhsidhukuldeep on Hugging Face: "There are 2.2 billion active @Apple devices 🍏 and all of them just got…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

singhsidhukuldeep

posted an update Jun 12

Post

2079

There are 2.2 billion active @Apple devices 🍏 and all of them just got smarter thanks to Apple Intelligence (AI) 🧠

Well, almost all devices... 🤔

Your device needs:
- A17 Pro chip or later if it's an iPhone 📱,
- M1 chip or later if iPad 📱,
- M1 chip or later if Mac 💻.

All this aside, this is probably the largest deployment of on-device LLMs 🌍.

Here is the technical goodness:
- AI will run ~3B LLM on device (Mac, iPhone, iPad) with grouped-query-attention, activation, and embedding quantization (Talaria bit rate selection) running on the neural engine 🚀.
- Will be using fine-tuned LoRA Adapters for different tasks, claiming to outperform other 7B and 3B LLMs! 🥇
- iPhone 15 Pro 0.6 ms time-to-first-token with 30 tokens/second latency ⏱.
- No server model size or details 🤐.
- Will be dynamically loading, caching, and swapping LoRA adapters (think LoRA Land) 🔄.
- On-device model has 49K vocab size, while the server model goes 100K 📚.
- Using rejection sampling fine-tuning and RLHF in post-processing 📈.
- A rejection sampling fine-tuning algorithm with teacher committee 🎓.
- And reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator 🧮.
- Used synthetic data generation (from bigger models, does not mention which) for tasks like summaries 📝.
- 750 evaluation samples for each production use case to evaluate summarization (dataset not released) 📊.
- No mention of multilingual support 🌐.
- Used Apple's AXLearn framework (JAX) and FSP to train on TPUs and GPUs 💪.
- 3B + Adapter outperforms Phi-3 mini, Gemma 7B, Mistral 7B on summarization 🏆.
- 3B + Adapter achieves 78.7% on IFEval beating Phi-3 mini, Gemma 7B, Mistral 7B; Server Model matches GPT-4-Turbo and beats Mixtral 8x22B and GPT-3.5-turbo ✨.

LoRA for the win! 🎉

Blog: https://machinelearning.apple.com/research/introducing-apple-foundation-models

ZeroWw

Jun 14

yeah 3B LLM quantized to Q4 and finetuned on appl stuff... meh

In this post