-
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3 -
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 30 -
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Paper • 1905.10044 • Published • 1 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2112.03857
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 18 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 10 -
Enabling Memory Safety of C Programs using LLMs
Paper • 2404.01096 • Published • 1
-
End-to-End Object Detection with Transformers
Paper • 2005.12872 • Published • 5 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 27 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 84
-
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper • 2103.14030 • Published • 4 -
A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images
Paper • 2104.12137 • Published • 2 -
Self-Supervised Learning with Swin Transformers
Paper • 2105.04553 • Published • 2 -
Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation
Paper • 2108.11993 • Published • 2
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 3 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 6 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 180 -
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 10 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3 -
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
Paper • 2404.07773 • Published • 1