LLMs Inference - a MElHuseyni Collection

MElHuseyni 's Collections

LLMs Inference

updated 5 days ago

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Paper • 2401.08671 • Published Jan 9 • 14
NanoFlow: Towards Optimal Large Language Model Serving Throughput

Paper • 2408.12757 • Published Aug 22 • 16
richard-park/llama3-deepspeed-v1.0

Text Generation • Updated Jul 4 • 2.23k • 1