Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations Paper • 2310.13420 • Published Oct 20, 2023 • 2
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published 28 days ago • 107
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 33
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 73
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12 • 65
Translated (En->Ko) dataset Collection Datasets translated from English to Korean using llama3-instrucTrans-enko-8b • 19 items • Updated Sep 20 • 3
Standard-format-preference-dataset Collection We collect the open-source datasets and process them into the standard format. • 14 items • Updated May 8 • 21
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 73
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2 • 64
Efficient Streaming Language Models with Attention Sinks Paper • 2309.17453 • Published Sep 29, 2023 • 13
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 87
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 19
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention Paper • 2309.14327 • Published Sep 25, 2023 • 21