Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.05258

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28 • 83
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3 • 80
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Paper • 2408.06195 • Published Aug 12 • 61
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance

Paper • 2405.06682 • Published May 5 • 2

clefourrier/graphormer-base-pcqm4mv2

Graph Machine Learning • Updated Feb 7, 2023 • 952 • 59
Differential Transformer

Paper • 2410.05258 • Published 9 days ago • 150

Associative Recurrent Memory Transformer

Paper • 2407.04841 • Published Jul 5 • 31
Mixture-of-Agents Enhances Large Language Model Capabilities

Paper • 2406.04692 • Published Jun 7 • 55
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31 • 63
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 251

THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

Paper • 2406.10996 • Published Jun 16 • 32
Simulating Classroom Education with LLM-Empowered Agents

Paper • 2406.19226 • Published Jun 27 • 29
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27 • 51
LAMBDA: A Large Model Based Data Agent

Paper • 2407.17535 • Published Jul 24 • 34

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30 • 19
Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1 • 8
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28 • 27
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118

Ternary LLMs & Knowledge distillation & SOTA

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published 15 days ago • 130
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 596
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 57
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14 • 27

about 1 hour ago

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24 • 26
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24 • 12
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 45
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 28

LM Architectures

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 62
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 41
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8 • 31
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138

Interesting Papers

about 24 hours ago

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 84
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9 • 64
Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15 • 27
Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23 • 59

ai21labs/Jamba-v0.1

Text Generation • Updated Sep 11 • 27k • 1.17k
databricks/dbrx-instruct

Text Generation • Updated Apr 19 • 4.15k • 1.1k
xai-org/grok-1

Text Generation • Updated Mar 28 • 1.43k • 2.16k
mistralai/Mistral-7B-Instruct-v0.2

Text Generation • Updated 20 days ago • 851k • • 2.54k

Previous
1
2
3
4
5
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs