Post
2079
There are 2.2 billion active
@Apple
devices ๐ and all of them just got smarter thanks to Apple Intelligence (AI) ๐ง
Well, almost all devices... ๐ค
Your device needs:
- A17 Pro chip or later if it's an iPhone ๐ฑ,
- M1 chip or later if iPad ๐ฑ,
- M1 chip or later if Mac ๐ป.
All this aside, this is probably the largest deployment of on-device LLMs ๐.
Here is the technical goodness:
- AI will run ~3B LLM on device (Mac, iPhone, iPad) with grouped-query-attention, activation, and embedding quantization (Talaria bit rate selection) running on the neural engine ๐.
- Will be using fine-tuned LoRA Adapters for different tasks, claiming to outperform other 7B and 3B LLMs! ๐ฅ
- iPhone 15 Pro 0.6 ms time-to-first-token with 30 tokens/second latency โฑ.
- No server model size or details ๐ค.
- Will be dynamically loading, caching, and swapping LoRA adapters (think LoRA Land) ๐.
- On-device model has 49K vocab size, while the server model goes 100K ๐.
- Using rejection sampling fine-tuning and RLHF in post-processing ๐.
- A rejection sampling fine-tuning algorithm with teacher committee ๐.
- And reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator ๐งฎ.
- Used synthetic data generation (from bigger models, does not mention which) for tasks like summaries ๐.
- 750 evaluation samples for each production use case to evaluate summarization (dataset not released) ๐.
- No mention of multilingual support ๐.
- Used Apple's AXLearn framework (JAX) and FSP to train on TPUs and GPUs ๐ช.
- 3B + Adapter outperforms Phi-3 mini, Gemma 7B, Mistral 7B on summarization ๐.
- 3B + Adapter achieves 78.7% on IFEval beating Phi-3 mini, Gemma 7B, Mistral 7B; Server Model matches GPT-4-Turbo and beats Mixtral 8x22B and GPT-3.5-turbo โจ.
LoRA for the win! ๐
Blog: https://machinelearning.apple.com/research/introducing-apple-foundation-models
Well, almost all devices... ๐ค
Your device needs:
- A17 Pro chip or later if it's an iPhone ๐ฑ,
- M1 chip or later if iPad ๐ฑ,
- M1 chip or later if Mac ๐ป.
All this aside, this is probably the largest deployment of on-device LLMs ๐.
Here is the technical goodness:
- AI will run ~3B LLM on device (Mac, iPhone, iPad) with grouped-query-attention, activation, and embedding quantization (Talaria bit rate selection) running on the neural engine ๐.
- Will be using fine-tuned LoRA Adapters for different tasks, claiming to outperform other 7B and 3B LLMs! ๐ฅ
- iPhone 15 Pro 0.6 ms time-to-first-token with 30 tokens/second latency โฑ.
- No server model size or details ๐ค.
- Will be dynamically loading, caching, and swapping LoRA adapters (think LoRA Land) ๐.
- On-device model has 49K vocab size, while the server model goes 100K ๐.
- Using rejection sampling fine-tuning and RLHF in post-processing ๐.
- A rejection sampling fine-tuning algorithm with teacher committee ๐.
- And reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator ๐งฎ.
- Used synthetic data generation (from bigger models, does not mention which) for tasks like summaries ๐.
- 750 evaluation samples for each production use case to evaluate summarization (dataset not released) ๐.
- No mention of multilingual support ๐.
- Used Apple's AXLearn framework (JAX) and FSP to train on TPUs and GPUs ๐ช.
- 3B + Adapter outperforms Phi-3 mini, Gemma 7B, Mistral 7B on summarization ๐.
- 3B + Adapter achieves 78.7% on IFEval beating Phi-3 mini, Gemma 7B, Mistral 7B; Server Model matches GPT-4-Turbo and beats Mixtral 8x22B and GPT-3.5-turbo โจ.
LoRA for the win! ๐
Blog: https://machinelearning.apple.com/research/introducing-apple-foundation-models