-
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 24 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 57 -
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Paper • 2403.09704 • Published • 31 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 67
Collections
Discover the best community collections!
Collections including paper arxiv:2403.15447
-
Recourse for reclamation: Chatting with generative language models
Paper • 2403.14467 • Published • 6 -
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Paper • 2403.15447 • Published • 16 -
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Paper • 2404.12241 • Published • 10
-
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test
Paper • 2309.13356 • Published • 36 -
Unveiling Safety Vulnerabilities of Large Language Models
Paper • 2311.04124 • Published • 6 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 64 -
Evaluating Frontier Models for Dangerous Capabilities
Paper • 2403.13793 • Published • 7
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 26 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 2 -
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Paper • 2403.15447 • Published • 16
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 134 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18
-
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 35 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 28 -
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper • 2311.06243 • Published • 17 -
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Paper • 2311.05908 • Published • 12
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 38 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 82 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 82