AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation Paper • 2301.08110 • Published Jan 19, 2023 • 1
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization Paper • 2311.01544 • Published Nov 2, 2023 • 1