Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.02246

about 17 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 38
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 19

Daily paper that worth reading in details later

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 94
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 71
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 44

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1 • 80
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29 • 26
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65
Poro 34B and the Blessing of Multilinguality

Paper • 2404.01856 • Published Apr 2 • 13

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1 • 14
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 47
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 40

Somethin to read

Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4 • 61
TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4 • 89
Simple and Scalable Strategies to Continually Pre-train Large Language Models

Paper • 2403.08763 • Published Mar 13 • 48
PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Paper • 2403.10704 • Published Mar 15 • 57

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Paper • 2312.02155 • Published Dec 4, 2023 • 12
LivePhoto: Real Image Animation with Text-guided Motion Control

Paper • 2312.02928 • Published Dec 5, 2023 • 16
FaceStudio: Put Your Face Everywhere in Seconds

Paper • 2312.02663 • Published Dec 5, 2023 • 30
aMUSEd: An Open MUSE Reproduction

Paper • 2401.01808 • Published Jan 3 • 28

VQA (Image captioning,QA)

Runtime error

35

📊

FuseCap
Running on T4

416

💻

Kosmos 2
Running

6

🚀

Vilt Nlvr
Build error

125

⚡

Qwen VL

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs