To read once X - a AndreiVoicuT Collection

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 68

Note Apply a transformation around the weight Q since this trans does not affects the nonlinearity it may be use to spare end incorporate into W matrix without adding errors.

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Paper • 2401.14112 • Published Jan 25 • 17

Note To address these problems, we pro- pose TC-FPx, the first full-stack GPU kernel design scheme with unified Tensor Core support of float-point weights for var- ious quantization bit-width. We integrate TC-FPx kernel into an existing inference system, providing new end-to-end sup- port (called FP6-LLM) for quantized LLM inference, where better trade-offs between inference cost and model quality are achieved. https://github.com/usyd-fsalab/fp6_llm

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Paper • 2401.13919 • Published Jan 25 • 25

Note https://github.com/MinorJerry/WebVoyager useful for automation see UI Path tipe of things.

Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation

Paper • 2401.14257 • Published Jan 25 • 9

Make-A-Shape: a Ten-Million-scale 3D Shape Model

Paper • 2401.11067 • Published Jan 20 • 15

Note Voxel power encoder decoder

Note Amazing one bit paper not quantisation but the model is build with 1-2 bit layers.

Towards Optimal Learning of Language Models

Paper • 2402.17759 • Published Feb 27 • 16

Note I think this is a paper that is worth reading because it is compatible with compactifAI they also have the code but it semas to be applied only on a toy model : https://arxiv.org/pdf/2402.17759.pdf

Humanoid Locomotion as Next Token Prediction

Paper • 2402.19469 • Published Feb 29 • 26

OneBit: Towards Extremely Low-bit Large Language Models

Paper • 2402.11295 • Published Feb 17 • 22

Note Sign-Value-Independent Decomposition I think this can be applied on top of the MPO

LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Paper • 2402.11550 • Published Feb 18 • 15

Note leader takes decision