Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.02073

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Paper • 2410.02073 • Published 14 days ago • 38

AI Math: 3DGauss

about 12 hours ago

EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis

Paper • 2410.01804 • Published 14 days ago • 5
DreamCinema: Cinematic Transfer with Free Camera and 3D Character

Paper • 2408.12601 • Published Aug 22 • 28
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

Paper • 2408.14819 • Published Aug 27 • 19
DressRecon: Freeform 4D Human Reconstruction from Monocular Video

Paper • 2409.20563 • Published 16 days ago • 7

DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Paper • 2410.00201 • Published 16 days ago
Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems

Paper • 2409.19804 • Published 17 days ago
Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling

Paper • 2409.15156 • Published 23 days ago
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue

Paper • 2409.04927 • Published Sep 7

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30 • 19
Spectrally Pruned Gaussian Fields with Neural Compensation

Paper • 2405.00676 • Published May 1 • 8
Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28 • 27
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 13
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 4

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14 • 7
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14 • 25
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 32
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16 • 26

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 596
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 32

3D Reconstruction

DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors

Paper • 2312.16837 • Published Dec 28, 2023 • 5
Learning the 3D Fauna of the Web

Paper • 2401.02400 • Published Jan 4 • 9
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

Paper • 2310.15110 • Published Oct 23, 2023 • 2
Zero-1-to-3: Zero-shot One Image to 3D Object

Paper • 2303.11328 • Published Mar 20, 2023 • 5

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs